[OMPI users] Help: HPL Compiled Problem
Hi, I'm going to compile HPL by using OpenMPI-1.2.4. Here's my Make.Linux_ATHLON_CBLAS file. # ## # # -- # - shell -- # -- # SHELL= /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR= mkdir RM = /bin/rm -f TOUCH= touch # # -- # - Platform identifier # -- # ARCH = Linux_ATHLON_CBLAS # # -- # - HPL Directory Structure / HPL library -- # -- # TOPdir = /ma/hpl-2.0 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # -- # - MPI directories - library -- # -- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir= /ma/openmpi-1.2.4 MPinc= -I$(MPdir)/include MPlib= $(MPdir)/lib/libmpi.so # # -- # - Linear Algebra library (BLAS or VSIPL) - # -- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # LAdir= /ma/GotoBLAS-1.26 LAinc= LAlib= $(LAdir)/libgoto.a # # -- # - F77 / C interface -- # -- # You can skip this section if and only if you are not planning to use # a BLAS library featuring a Fortran 77 interface. Otherwise, it is # necessary to fill out the F2CDEFS variable with the appropriate # options. **One and only one** option should be chosen in **each** of # the 3 following categories: # # 1) name space (How C calls a Fortran 77 routine) # # -DAdd_ : all lower case and a suffixed underscore (Suns, # Intel, ...), [default] # -DNoChange : all lower case (IBM RS6000), # -DUpCase: all upper case (Cray), # -DAdd__ : the FORTRAN compiler in use is f2c. # # 2) C and Fortran 77 integer mapping # # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. # # 3) Fortran 77 string handling # # -DStringSunStyle: The string address is passed at the string loca- # tion on the stack, and the string length is then # passed as an F77_INTEGER after all explicit # stack arguments, [default] # -DStringStructPtr : The address of a structure is passed by a # Fortran 77 string, and the structure is of the # form: struct {char *cp; F77_INTEGER len;}, # -DStringStructVal : A structure is passed by value for each Fortran # 77 string, and the structure is of the form: # struct {char *cp; F77_INTEGER len;}, # -DStringCrayStyle : Special option for Cray machines, which uses # Cray fcd (fortran character descriptor) for # interoperation. # F2CDEFS = # # -- # - HPL includes / libraries / specifics --- # -- # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) # # - Compile time options --- # # -DHPL_COPY_L force the copy of the panel L before bcast; # -DHPL_CALL_CBLAS call the cblas interface; # -DHPL_CALL_VSIPL call the vsip library; # -DHPL_DETAILED_TIMING enable detailed timers; # # By default HPL will: #*) n
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, 22 Jul 2009, Lee Amy wrote: > Hi, > > I'm going to compile HPL by using OpenMPI-1.2.4. Here's my > Make.Linux_ATHLON_CBLAS file. GotoBLAS needs to be called as Fortran BLAS, so you need to switch from CBLAS to FBLAS. Daniël Mantione
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, Jul 22, 2009 at 2:20 PM, Daniël Mantione wrote: > > > On Wed, 22 Jul 2009, Lee Amy wrote: > >> Hi, >> >> I'm going to compile HPL by using OpenMPI-1.2.4. Here's my >> Make.Linux_ATHLON_CBLAS file. > > GotoBLAS needs to be called as Fortran BLAS, so you need to switch from > CBLAS to FBLAS. > > Daniël Mantione > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Dear sir, Thank you very much. I have compiled HPL successfully. But when I start up xhpl program I encountered such problem. [node101:15416] *** Process received signal *** [node101:15418] *** Process received signal *** [node101:15418] Signal: Segmentation fault (11) [node101:15418] Signal code: Address not mapped (1) [node101:15418] Failing at address: 0x7fff [node101:15416] Signal: Segmentation fault (11) [node101:15416] Signal code: Address not mapped (1) [node101:15416] Failing at address: 0x7fff [node101:15418] [ 0] /lib64/libc.so.6 [0x2b7e20aa1c30] [node101:15418] [ 1] xhpl [0x4259f0] [node101:15418] *** End of error message *** [node101:15416] [ 0] /lib64/libc.so.6 [0x2aacfce93c30] [node101:15416] [ 1] xhpl [0x4259f0] [node101:15416] *** End of error message *** mpirun noticed that job rank 0 with PID 15416 on node node101 exited on signal 11 (Segmentation fault). Here's the uname -a output. Linux node101 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Here's the lsb_release output. LSB Version: core-2.0-noarch:core-3.0-noarch:core-2.0-x86_64:core-3.0-x86_64:desktop-3.1-amd64:desktop-3.1-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.1-amd64:graphics-3.1-noarch Could you tell me how to fix that? Thank you very much. Amy
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, 22 Jul 2009, Lee Amy wrote: > Dear sir, > > Thank you very much. I have compiled HPL successfully. But when I > start up xhpl program I encountered such problem. > > mpirun noticed that job rank 0 with PID 15416 on node node101 exited > on signal 11 (Segmentation fault). > > Could you tell me how to fix that? That error message gives very little information to diagnose the problem. Maybe you can recompile with debug information, then it will print a more meaningfull backtrace. Also, please compare your Makefile with the attached one. Daniël Mantione# # -- High Performance Computing Linpack Benchmark (HPL) # HPL - 1.0a - January 20, 2004 # Antoine P. Petitet # University of Tennessee, Knoxville # Innovative Computing Laboratories # (C) Copyright 2000-2004 All Rights Reserved # # -- Copyright notice and Licensing terms: # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions, and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # 3. All advertising materials mentioning features or use of this # software must display the following acknowledgement: # This product includes software developed at the University of # Tennessee, Knoxville, Innovative Computing Laboratories. # # 4. The name of the University, the name of the Laboratory, or the # names of its contributors may not be used to endorse or promote # products derived from this software without specific written # permission. # # -- Disclaimer: # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY # OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # ## # # -- # - shell -- # -- # SHELL= /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR= mkdir RM = /bin/rm -f TOUCH= touch # # -- # - Platform identifier # -- # ARCH = clustervision-openmpi-intel # # -- # - HPL Directory Structure / HPL library -- # -- # TOPdir = $(HOME)/hpl INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # -- # - Message Passing library (MPI) -- # ---
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, Jul 22, 2009 at 2:53 PM, Daniël Mantione wrote: > > > On Wed, 22 Jul 2009, Lee Amy wrote: > >> Dear sir, >> >> Thank you very much. I have compiled HPL successfully. But when I >> start up xhpl program I encountered such problem. >> >> mpirun noticed that job rank 0 with PID 15416 on node node101 exited >> on signal 11 (Segmentation fault). >> >> Could you tell me how to fix that? > > That error message gives very little information to diagnose the problem. > Maybe you can recompile with debug information, then it will print a more > meaningfull backtrace. > > Also, please compare your Makefile with the attached one. > > Daniël Mantione > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Thanks. I have use your Makefile to recompile. However, I still encounter some odd problem. I have attached the make output and Makefile. Thank you very much. Amy make_1 Description: Binary data Make.Linux_PII_FBLAS Description: Binary data
[OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything
Hi I faced a warning "declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything" using openmpi 1.2.4 (compiling under Fedora 10 with mpic++ wrapper over gcc 4.3.2) and don't know how to solve it. Browsing the Internet i've found an advise just to ignore it, but i don't think it is impossible to solve it in another way. I have a correct working single thread program. Then i just include mpi.h, compile and get this: In file included from /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/mpicxx.h:246, from /usr/include/openmpi/1.2.4-gcc/mpi.h:1783, from /home/user/NetBeansProjects/Correlation_orig/Correlation/Correlation.cpp:2: /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/request_inln.h:347: warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything The program is still works correctly but this warning makes me nervous. Sincerely yours, Alexey.
Re: [OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything
Hi Alexey I don't know how this error messgae comes about, but have you ever considered using a newer version of Open MPI? 1.2.4 is quite ancient, the current version is 1.3.3 http://www.open-mpi.org/software/ompi/v1.3/ Jody On Wed, Jul 22, 2009 at 9:17 AM, Alexey Sokolov wrote: > Hi > > I faced a warning "declaration ‘struct MPI::Grequest_intercept_t’ does > not declare anything" using openmpi 1.2.4 (compiling under Fedora 10 > with mpic++ wrapper over gcc 4.3.2) and don't know how to solve it. > Browsing the Internet i've found an advise just to ignore it, but i > don't think it is impossible to solve it in another way. > > I have a correct working single thread program. Then i just include > mpi.h, compile and get this: > > In file included > from /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/mpicxx.h:246, > from /usr/include/openmpi/1.2.4-gcc/mpi.h:1783, > > from > /home/user/NetBeansProjects/Correlation_orig/Correlation/Correlation.cpp:2: > > /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/request_inln.h:347: > warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare > anything > > The program is still works correctly but this warning makes me nervous. > > Sincerely yours, Alexey. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything
Hi Jody As I'm new at linux it was much simpler for me to use default Fedora yum installer and the latest version accessible with it is still 1.2.4. I've installed the latest 1.3.3 version as you advised and that warning disappeared. Still don't know how and why but the problem now is solved. Sincerely yours, Alexey. On Wed, 2009-07-22 at 09:55 +0200, jody wrote: > Hi Alexey > > I don't know how this error messgae comes about, > but have you ever considered using a newer version of Open MPI? > 1.2.4 is quite ancient, the current version is 1.3.3 >http://www.open-mpi.org/software/ompi/v1.3/ > Jody > > > > On Wed, Jul 22, 2009 at 9:17 AM, Alexey Sokolov wrote: > > Hi > > > > I faced a warning "declaration ‘struct MPI::Grequest_intercept_t’ does > > not declare anything" using openmpi 1.2.4 (compiling under Fedora 10 > > with mpic++ wrapper over gcc 4.3.2) and don't know how to solve it. > > Browsing the Internet i've found an advise just to ignore it, but i > > don't think it is impossible to solve it in another way. > > > > I have a correct working single thread program. Then i just include > > mpi.h, compile and get this: > > > >In file included > >from > > /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/mpicxx.h:246, > > from /usr/include/openmpi/1.2.4-gcc/mpi.h:1783, > > > >from > > /home/user/NetBeansProjects/Correlation_orig/Correlation/Correlation.cpp:2: > > > > /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/request_inln.h:347: > > warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare > > anything > > > > The program is still works correctly but this warning makes me nervous. > > > > Sincerely yours, Alexey. > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, 22 Jul 2009, Lee Amy wrote: > Thanks. I have use your Makefile to recompile. However, I still > encounter some odd problem. > > I have attached the make output and Makefile. I see nothing wrong with the make output? Daniël Mantione
Re: [OMPI users] Help: HPL Compiled Problem
On Wed, Jul 22, 2009 at 4:41 PM, Daniël Mantione wrote: > > > On Wed, 22 Jul 2009, Lee Amy wrote: > >> Thanks. I have use your Makefile to recompile. However, I still >> encounter some odd problem. >> >> I have attached the make output and Makefile. > > I see nothing wrong with the make output? > > Daniël Mantione > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Thanks, I have solved that problem. It's the problem that occurred at GotoBLAS library. Thank you very much, Amy
Re: [OMPI users] Network connection check
I'm not sure what you mean. Open MPI uses the hostname of the machine for general identification purposes. That may be the same (or not) from the resolved name that comes back for a given IP interface. What are you trying to check, exactly? On Jul 16, 2009, at 1:56 AM, vipin kumar wrote: Hi all, Is there any way to check network connection using HostName in OpenMPI ? Thanks and Regards, -- Vipin K. Research Engineer, C-DOTB, India ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Network connection check
Hi Jeff, Thanks for your response. Actually requirement is how a C/C++ program running in "master" node should find out whether "slave" node is reachable (as we check this using "ping" command) or not ? Because IP address may change at any time, that's why I am trying to achieve this using "host name" of the "slave" node. How this can be done? Thanks & Regards, On Wed, Jul 22, 2009 at 6:54 PM, Jeff Squyres wrote: > I'm not sure what you mean. Open MPI uses the hostname of the machine for > general identification purposes. That may be the same (or not) from the > resolved name that comes back for a given IP interface. > > What are you trying to check, exactly? > > > > On Jul 16, 2009, at 1:56 AM, vipin kumar wrote: > > Hi all, >> >> Is there any way to check network connection using HostName in OpenMPI ? >> >> >> Thanks and Regards, >> -- >> Vipin K. >> Research Engineer, >> C-DOTB, India >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Vipin K. Research Engineer, C-DOTB, India
Re: [OMPI users] Network connection check
On Jul 22, 2009, at 10:05 AM, vipin kumar wrote: Actually requirement is how a C/C++ program running in "master" node should find out whether "slave" node is reachable (as we check this using "ping" command) or not ? Because IP address may change at any time, that's why I am trying to achieve this using "host name" of the "slave" node. How this can be done? Are you asking to find out this information before issuing "mpirun"? Open MPI does assume that the nodes you are trying to use are reachable. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released
On Jul 20, 2009, at 9:03 AM, Dave Love wrote: > Hmmm...there should be messages on both the user and devel lists > regarding binary compatibility at the MPI level being promised for > 1.3.2 and beyond. This is confusing. As I read the quotes below, recompilation is necessary, and the announcement has items which suggest at least some of the ABI has changed. The MPI ABI has not changed since 1.3.2. We started making MPI ABI promises with v1.3.2 -- so any version prior to that (including 1.3.0 and 1.3.1) are not guaranteed to be ABI compatible. To be clear: you should be able to mpicc/mpif77/etc. an MPI application with Open MPI v1.3.2 and then be able to run it against an Open MPI v1.3.3 installation (e.g., change your LD_LIBRARY_PATH to point to an OMPI v1.3.3 installation). Note that our internal API's are *not* guaranteed to be ABI compatible between releases (we try hard to keep them stable between releases in a single series, but it doesn't always work). We're only providing an ABI guarantee for the official MPI API. Could the promise also specify that future ABI changes will result in ELF version changes to avoid any more of the mess with the 1.2 and 1.3 libraries wrongly appearing as compatible to the dynamic linker? It should just be a question of managing changes and doing the right thing with libtool. Yes, we should. This issue has come up before, but it's gotten muddied by some other (uninteresting) technical issues. I'll bring it up again with the rest of the developers. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] ifort and gfortran module
On Jul 20, 2009, at 9:09 AM, Dave Love wrote: > you should compile openmpi with each pf intel and gfortran seperatly > and install each of them in a separate location, and use mpi- selector > to select one. What, precisely, requires that, at least if you can recompile the MPI program with appropriate options? (Presumably it's features of the Fortran/C interfacing and/or Fortran runtime, but the former may be influenced by compilation options, and I'd hope the glue didn't require the compiler runtime -- the Intel compiler is on the list to check.) See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257. It's obviously of interest to those of us facing combinatorial explosion of libraries we're expected to install. Indeed. In OMPI, we tried to make this as simple as possible. But unless you use specific compiler options to hide their differences, it isn't possible and is beyond our purview to fix. :-( (similar situation with the C++ bindings) Also, is there any reason to use mpi-selector rather than switcher? Nope -- they do about the same thing. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] ifort and gfortran module
Yep, that works. I'm glad that our txt files and "look at argv[0]" scheme was useful in the real world! (we designed it with uses almost exactly like this in mind) On Jul 20, 2009, at 1:47 PM, Martin Siegert wrote: Hi, I want to avoid separate MPI distributions since we compile many MPI software packages. Having more than one MPI distribution (at least) doubles the amount of work. For now I came up with the following solution: 1. compile openmpi using gfortran as the Fortran compiler and install it in /usr/local/openmpi 2. move the Fortran module to the directory /usr/local/openmpi/include/gfortran. In that directory create softlinks to the files in /usr/local/openmpi/include. 3. compile openmpi using ifort and install the Fortran module in /usr/local/openmpi/include. 4. in /usr/local/openmpi/bin create softlinks mpif90.ifort and mpif90.gfortran pointing to opal_wrapper. Remove the mpif90 softlink. 5. Move /usr/local/openmpi/share/openmpi/mpif90-wrapper-data.txt to /usr/local/openmpi/share/openmpi/mpif90.ifort-wrapper-data.txt. Copy the file to /usr/local/openmpi/share/openmpi/mpif90.gfortran-wrapper-data.txt and change the line includedir=${includedir} to includedir=${includedir}/gfortran 6. Create a wrapper script /usr/local/openmpi/bin/mpif90: #!/bin/bash OMPI_WRAPPER_FC=`basename $OMPI_FC 2> /dev/null` if [ "$OMPI_WRAPPER_FC" = 'gfortran' ]; then exec $0.gfortran "$@" else exec $0.ifort "$@" fi The reason we use gfortran in step 1 is that otherwise you get those irritating error messages from the Intel libraries, cf. http://www.open-mpi.org/faq/?category=building#intel-compiler-wrapper-compiler-warnings Cheers, Martin -- Martin Siegert Head, Research Computing WestGrid Site Lead IT Servicesphone: 778 782-4691 Simon Fraser Universityfax: 778 782-4242 Burnaby, British Columbia email: sieg...@sfu.ca Canada V5A 1S6 On Sat, Jul 18, 2009 at 10:03:50AM +0330, rahmani wrote: > Hi, > you should compile openmpi with each pf intel and gfortran seperatly and install each of them in a separate location, and use mpi-selector to select one. > if don't use mpi selector, use full path of the compiler (for example /usr/local/openmpi/intel/bin/mpif90) and add the corresponding library to your LD_LIBRARY_PATH > Mahdi Rahmani > > - Original Message - > From: "Jim Kress" > To: "Open MPI Users" > Sent: Saturday, July 18, 2009 5:43:20 AM (GMT+0330) Asia/Tehran > Subject: Re: [OMPI users] ifort and gfortran module > > Why not generate an ifort version with a prefix of want for > openmpi>_intel > And the gfortran version with a prefix of openmpi>_gcc > > ? > > That's what I do and then use mpi-selector to switch between versions as > required. > > Jim > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] On > Behalf Of Martin Siegert > Sent: Friday, July 17, 2009 3:29 PM > To: Open MPI Users > Subject: [OMPI users] ifort and gfortran module > > Hi, > > I am wondering whether it is possible to support both the Intel > compiler ifort and gfortran within a single compiled version of > openmpi. > E.g., > 1. compile openmpi ifort as the Fortran compiler and install it >in /usr/local/openmpi-1.3.3 > 2. compile openmpi using gfortran, but do not install it; only >copy mpi.mod to /usr/local/openmpi-1.3.3/include/gfortran > > Is there a way to cause mpif90 to include > /usr/local/openmpi-1.3.3/include/gfortran > before including /usr/local/openmpi-1.3.3/include if OMPI_FC is > set to gfortran (more precisely if `basename $OMPI_FC` = gfortran)? > > Or is there another way of accomplishing this? > > Cheers, > Martin > > -- > Martin Siegert > Head, Research Computing > WestGrid Site Lead > IT Servicesphone: 778 782-4691 > Simon Fraser Universityfax: 778 782-4242 > Burnaby, British Columbia email: sieg...@sfu.ca > Canada V5A 1S6 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] ifort and gfortran module
On Jul 22, 2009, at 1:37 PM, Jeff Squyres (jsquyres) wrote: Yep, that works. I should clarify -- that *probably* works. The .mod file are essentially precompiled headers. Assuming that all the data types and sizes are the same between gfortran and ifort, you should be ok. Many of OMPI's F90 functions are implemented by directly calling the back-end F77 functions, but some of them have thin F90 wrappers before calling the back-end F77 functions. If the calling conventions, parameter sizes, and constant values (see that README that I cited earlier in this thread) are all the same, then you should be ok using a single back-end libmpi_f77 and libmpi_f90 with 2 different .mod files. But this is not something I have tested extensively, so I can't give you a definite "this will always work" ruling. I *think* that there are compiler flags that you can use with ifort to make it behave similarly to gfortran in terms of sizes and constant values, etc. These may or may not be necessary...? You might want to check into this, but I'd be interested to hear what you find. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
mpirun --display-allocation --display-map Run a batch job that just prints out $PBS_NODEFILE. I'll bet that it isn't what we are expecting, and that the problem comes from it. In a Torque environment, we read that file to get the list of nodes and #slots/node that are allocated to your job. We then filter that through any hostfile you provide. So all the nodes have to be in the $PBS_NODEFILE, which has to be in the expected format. I'm a little suspicious, though, because of your reported error. It sounds like we are indeed trying to launch a daemon on a known node. I can only surmise a couple of possible reasons for the failure: 1. this is a node that is not allocated for your use. Was node0006 in your allocation?? If not, then the launch would fail. This would indicate we are not parsing the nodefile correctly. 2. if the node is in your allocation, then I would wonder if you have a TCP connection between that node and the one where mpirun exists. Is there a firewall in the way? Or something that would preclude a connection? Frankly, I doubt this possibility because it works when run manually. My money is on option #1. :-) If it is #1 and you send me a copy of a sample $PBS_NODEFILE on your system, I can create a way to parse it so we can provide support for that older version. Ralph On Jul 21, 2009, at 4:44 PM, Song, Kai Song wrote: Hi Ralph, Thanks a lot for the fast response. Could you give me more instructions on which command do I put "-- display-allocation" and "--display-map" with? mpirun? ./configure?... Also,we have tested that in our PBS script, if we put node=1, the helloworld works. But, when I put node=2 or more, it will hang until timeout . And the error message will be something like: node0006 - daemon did not report back when launched However, if we don't go through the scheduler and run mpi manually, everything works fine too. /home/software/ompi/1.3.2-pgi/bin/mpirun -machinefile ./nodes -np 16 ./a.out What do you think the problem would be? It's not the network issue, because manually running MPI works. That is why we question about torque compatibility. Thanks again, Kai Kai Song 1.510.486.4894 High Performance Computing Services (HPCS) Intern Lawrence Berkeley National Laboratory - http://scs.lbl.gov - Original Message - From: Ralph Castain Date: Tuesday, July 21, 2009 12:12 pm Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque? To: Open MPI Users I'm afraid I have no idea - I've never seen a Torque version that old, however, so it is quite possible that we don't work with it. It also looks like it may have been modified (given the p2-aspen3 on the end), so I have no idea how the system would behave. First thing you could do is verify that the allocation is being read correctly. Add a --display-allocation to the cmd line and see what we think Torque gave us. Then add --display-map to see where it plans to place the processes. If all that looks okay, and if you allow ssh, then try -mca plm rsh on the cmd line and see if that works. HTH Ralph On Tue, Jul 21, 2009 at 12:57 PM, Song, Kai Song wrote: Hi All, I am building open-mpi-1.3.2 on centos-3.4, with torque-1.1.0p2- aspen3 and myrinet. I compiled it just fine with this configuration: ./configure --prefix=/home/software/ompi/1.3.2-pgi --with- gm=/usr/local/> --with-gm-libdir=/usr/local/lib64/ --enable-static - -disable-shared --with-tm=/usr/ --without-threads CC=pgcc CXX=pgCC FC=pgf90 F77=pgf77> LDFLAGS=-L/usr/lib64/torque/ However, when I submit jobs for 2 or more nodes through the torque schedular, the jobs just hang here. It shows the RUN state, but no communication between the nodes, then jobs will die with timeout. We have comfirmed that the myrinet is working because our lam-mpi- 7.1 works just fine. We are having a really hard time determining what are the causes for this problem. So, we suspect it's because our torque is too old. What is the lowest version requirement of torque for open-mpi- 1.3.2? The README file didn't specify this detail. Does anyone know more about it? Thanks in advance, Kai Kai Song 1.510.486.4894 High Performance Computing Services (HPCS) Intern Lawrence Berkeley National Laboratory - http://scs.lbl.gov ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"
Dear OpenMPI experts I would like to experiment with the OpenMPI tuned collectives, hoping to improve the performance of some programs we run in production mode. However, I could not find any documentation on how to select the different collective algorithms and other parameters. In particular, I would love to read an explanation clarifying the syntax and meaning of the lines on "dyn_rules" file that is passed to "-mca coll_tuned_dynamic_rules_filename ./dyn_rules" Recently there was an interesting discussion on the list about this topic. It showed that choosing the right collective algorithm can make a big difference in overall performance: http://www.open-mpi.org/community/lists/users/2009/05/9355.php http://www.open-mpi.org/community/lists/users/2009/05/9399.php http://www.open-mpi.org/community/lists/users/2009/05/9401.php http://www.open-mpi.org/community/lists/users/2009/05/9419.php However, the thread was concentrated on "MPI_Alltoall". Nothing was said about other collective functions. Not much was said about the "tuned collective dynamic rules" file syntax, the meaning of its parameters, etc. Is there any source of information about that which I missed? Thank you for any pointers or clarifications. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA -