Re: [OMPI users] Problems installing in Cygwin - Problem with GCC 3.4.4
> *** Fortran 90/95 compiler > checking whether we are using the GNU Fortran compiler... yes > checking whether g95 accepts -g... yes > checking if Fortran compiler works... yes > checking whether g95 and g95 compilers are compatible... no > configure: WARNING: *** Fortran 77 and Fortran 90 compilers are not > link compatible > configure: WARNING: *** Disabling MPI Fortran 90/95 bindings OK, for that one I think you need to dig into config.log and see exactly what's failing and why. I can't speak for the developers, but it seems slightly concerning that configure thinks it's using "the GNU Fortran compiler". I feel sure the GNU people would object to g95 being called that.
[OMPI users] question regarding the configuration of multiple nics for openmpi
Hello, I am configuring a cluster with multiple nics for use with open mpi. I have not found very much information on the best way of setting up my network for open mpi. At the moment I have a pretty standard setup with a single hostname and single ip address for each node. Could someone advise me on the following points? - for each node, should I have the second ip on the same subnet as the first, or not ? - does openmpi need separate hostnames for each ip? If there is a webpage describing how to configure such a network for the best, that would be great. Many thanks, Olivier Marsden
[OMPI users] mca btl_openib_flags default value
Bonjour, Working with OpenMPI 1.2.5 on RHEL5.2, I noticed a weird default value for this mca parameter, as printed by ompi_info: MCA btl: parameter "btl_openib_flags" (current value: "54") BTL flags, added together: SEND=1, PUT=2, GET=4 (cannot be 0) Is this expected or not ? I could understand any value between 1 & 7, but what does mean 54, please ? Does it behave like 6, if removal of the unexpected bits ? Thanks,Gilbert -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) -
[OMPI users] OpenMPI-1.2.7 + SGE
Hi all, In Rocks-5.0 cluster, OpenMPI-1.2.6 comes by default. I guess it gets installed through rpm. # /opt/openmpi/bin/ompi_info | grep gridengine MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.6) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.6) Now I've to install OpenMPI-1.2.7. The "./configure --help | grep gridengine" - doesn't show anything. In such scenario how OpenMPI-1.2.7 can be integrated to SGE? After achieving this integration: 1. Is it possible to use -machinefile option in the SGE script? Eg: #$ -pe orte 4 /opt/openmpi/bin/mpirun -machinefile $TMPDIR/machines -np 4 2. If "qstat -f" is showing 2 slots on node1 and 2 slots on node2 for a 4 process openmpi job, then will these processes run exactly on those nodes? # qconf -sp orte pe_name orte slots 999 user_listsNONE xuser_lists NONE start_proc_args /bin/true stop_proc_args/bin/true allocation_rule $fill_up control_slavesTRUE job_is_first_task FALSE urgency_slots min Thank you, Sangamesh Consultant - HPC
Re: [OMPI users] question regarding the configuration of multiple nics for openmpi
Hi Olivier and list I presume you are talking about Ethernet or GigE. The basic information on how to launch jobs is on the OpenMPI FAQ pages: http://www.open-mpi.org/faq/?category=tcp http://www.open-mpi.org/faq/?category=tcp#tcp-selection Here is what I did on our toy/test cluster made of salvaged computers. 1) I use ROCKS cluster, which makes some steps more automatic then described below. However, ROCKS is not needed for this. 2) I have actually three private networks, but you may use, say, two, if your motherboards have dual Ethernet (or GigE) ports. Each node has three NICs, which Linux recognized and activated as eth0, eth1, eth2. Make sure you and Linux agree on which port is eth0, eth1, etc. This may be a bit tricky, the kernel seems to have its own wisdom and mood when it assigns the port names. Ping, lspci, ifconfig, ifup, ifdown, ethtool, are your friends here, and can help you sort out the correct port-name map. 3) For a modest number of nodes, less than 8, you can buy inexpensive SOHO type GigE switches, one for each network, for about $50 a piece. (This is what I did.) For more nodes you would need larger switches. Use Cat5e or Cat6 Ethernet cables and connect the separate networks using the correct ports on the nodes and switches. Well, you may have done that already ... 4) On RHEL or Fedora the essential information is on /etc/sysconfig/network-scripts/ifcfg-eth[0,1,2], on each of your cluster nodes. Other Linux distributions may have equivalent files. You need to edit these files to insert the correct IP address, netmask, and MAC address. For instance, if you have less than 254 nodes, you can define private networks like this: net1) 192.168.1.0 netmask 255.255.255.0 (using the eth0 port) net2) 192.168.2.0 netmask 255.255.255.0 (using the eth1 port) net3) 192.168.3.0 netmask 255.255.255.0 (using the eth2 port) etc. Here is an example: [node1] $ cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=(put your eth0 port MAC address here) IPADDR=192.168.1.1 ( ... 192.168.1.2 on node2, etc) NETMASK=255.255.255.0 BOOTPROTO=none ONBOOT=yes [node1] $ cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 HWADDR=(put your eth1 port MAC address here) IPADDR=192.168.2.1 ( ... 192.168.2.2 on node2, etc) NETMASK=255.255.255.0 BOOTPROTO=none ONBOOT=yes 5) To launch the OpenMPI program "mp_prog" using the 192.168.2.0 (i.e. "eth1") network using, say, 8 processes, do: mpiexec --mca btl_tcp_if_include eth1 -n 8 my_prog (Good if your 192.168.1.0 (eth0) network is already used for I/O, control, etc.) To be more aggressive, and use both networks, 192.168.1.0 ("eth0") and 192.168.2.0 ("eth1") do: mpiexec --mca btl_tcp_if_include eth0,eth1 -n 8 my_prog *** Works for me. I hope it helps! Gus Correa PS - More answers below. -- - Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA - Olivier Marsden wrote: Hello, I am configuring a cluster with multiple nics for use with open mpi. I have not found very much information on the best way of setting up my network for open mpi. At the moment I have a pretty standard setup with a single hostname and single ip address for each node. Could someone advise me on the following points? - for each node, should I have the second ip on the same subnet as the first, or not ? No, use separate subnets. - does openmpi need separate hostnames for each ip? No, same hostname, but different subnets and different IPs for each port on a given host. If there is a webpage describing how to configure such a network for the best, that would be great. Yes, to some extent. Look at the OpenMPI FAQ: http://www.open-mpi.org/faq/?category=tcp http://www.open-mpi.org/faq/?category=tcp#tcp-selection Many thanks, Olivier Marsden ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI-1.2.7 + SGE
Hi, Am 04.11.2008 um 16:54 schrieb Sangamesh B: Hi all, In Rocks-5.0 cluster, OpenMPI-1.2.6 comes by default. I guess it gets installed through rpm. # /opt/openmpi/bin/ompi_info | grep gridengine MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.6) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.6) Now I've to install OpenMPI-1.2.7. The "./configure --help | grep gridengine" - doesn't show anything. In such scenario how OpenMPI-1.2.7 can be integrated to SGE? only for 1.3 it must be compiled with --with-sge, not in 1.2.x After achieving this integration: 1. Is it possible to use -machinefile option in the SGE script? Eg: #$ -pe orte 4 /opt/openmpi/bin/mpirun -machinefile $TMPDIR/machines -np 4 You don't need this. Open MPI with use the correct cores on its own. Just specify: mpirun -np $NSLOTS mypgm 2. If "qstat -f" is showing 2 slots on node1 and 2 slots on node2 for a 4 process openmpi job, then will these processes run exactly on those nodes? qstat is only an output of what is granted to the job. With a bad configuration you could start all forks on the master node of the parallel job and leave the slaves idling. Open MPI will do the right thing on its own. -- Reuti # qconf -sp orte pe_name orte slots 999 user_listsNONE xuser_lists NONE start_proc_args /bin/true stop_proc_args/bin/true allocation_rule $fill_up control_slavesTRUE job_is_first_task FALSE urgency_slots min Thank you, Sangamesh Consultant - HPC ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mca btl_openib_flags default value
FWIW, we fixed this help message in the upcoming v1.3. The new help message is: mca:btl:openib:param:btl_openib_flags:help:BTL bit flags (general flags: SEND=1, PUT=2, GET=4, SEND_INPLACE=8; flags only used by the "dr" PML (ignored by others): ACK=16, CHECKSUM=32, RDMA_COMPLETION=128) So 54 corresponds to PUT, GET, ACK, CHECKSUM (SEND is implied; IIRC it's somewhat silly that we have SEND as a flag because we assume that all BTL's can do it). ...although I see that the v1.3 message doesn't show the HETEROGENEOUS flag, which is 256. /me goes to fix that... On Nov 4, 2008, at 8:57 AM, Gilbert Grosdidier wrote: Bonjour, Working with OpenMPI 1.2.5 on RHEL5.2, I noticed a weird default value for this mca parameter, as printed by ompi_info: MCA btl: parameter "btl_openib_flags" (current value: "54") BTL flags, added together: SEND=1, PUT=2, GET=4 (cannot be 0) Is this expected or not ? I could understand any value between 1 & 7, but what does mean 54, please ? Does it behave like 6, if removal of the unexpected bits ? Thanks,Gilbert -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) - ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] OK, got it installed, but... can't find libraries
I went through the compile process with openMPI - twice, using g95 and gfortran (the default install on my openSuSE11.0 setup). It seems to have trouble finding the libraries, in particular libopen-pal.so.0 I've seen shared-library problems with some x86_64 packages that I contrib to SourceForge and I'm wondering if this is a known problem with openMPI? I'm using a TYAN 32-processor SMP machine with openSuSE11.0 installed. (I tried copying the shared object file(s) to /usr/lib and /usr/lib64) This is STDERR output, the first time with g95 and then with gfortran: linux-pouh:/usr/local/openmpi-1.2.8 # ./configure FC=/usr/local/g95-install64bi/bin/x86_64-suse-linux-gnu-g95 --prefix=/usr/local/bin configure: WARNING: -fno-strict-aliasing has been added to CFLAGS configure: WARNING: -finline-functions has been added to CXXFLAGS configure: WARNING: *** Did not find corresponding C type configure: WARNING: *** Fortran 77 and Fortran 90 compilers are not link compatible configure: WARNING: *** Disabling MPI Fortran 90/95 bindings configure: WARNING: On Linux and --with-udapl was not specified configure: WARNING: Not building the udapl BTL configure: WARNING: Unknown architecture ... proceeding anyway configure: WARNING: File locks may not work with NFS. See the Installation and users manual for instructions on testing and if necessary fixing this linux-pouh:/usr/local/openmpi-1.2.8 # mpif90 mpif90: error while loading shared libraries: libopen-pal.so.0: cannot open shared object file: No such file or directory linux-pouh:/usr/local/openmpi-1.2.8 # ... now try gfortran ... /usr/local/openmpi-1.2.8 # ./configure --prefix=/usr/local/bin > configure_STDIO.txt configure: WARNING: -fno-strict-aliasing has been added to CFLAGS configure: WARNING: -finline-functions has been added to CXXFLAGS configure: WARNING: *** Did not find corresponding C type configure: WARNING: *** Corresponding Fortran 77 type (INTEGER*16) not supported configure: WARNING: *** Skipping Fortran 90 type (INTEGER*16) configure: WARNING: On Linux and --with-udapl was not specified configure: WARNING: Not building the udapl BTL configure: WARNING: Unknown architecture ... proceeding anyway configure: WARNING: File locks may not work with NFS. See the Installation and users manual for instructions on testing and if necessary fixing this linux-pouh:/usr/local/openmpi-1.2.8 # make all install > GFortMakeAllInstall_STDIO.txt mpif90 libtool: install: warning: relinking `mca_maffinity_first_use.la' libtool: install: warning: relinking `mca_maffinity_libnuma.la' libtool: install: warning: relinking `mca_paffinity_linux.la' libtool: install: warning: relinking `libopen-rte.la' libtool: install: warning: relinking `mca_mpool_rdma.la' libtool: install: warning: relinking `mca_mpool_sm.la' libtool: install: warning: relinking `mca_pml_cm.la' libtool: install: warning: relinking `mca_pml_ob1.la' libtool: install: warning: relinking `mca_rcache_vma.la' libtool: install: warning: relinking `mca_topo_unity.la' linux-pouh:/usr/local/openmpi-1.2.8 # mpif90 mpif90: error while loading shared libraries: libopen-pal.so.0: cannot open shared object file: No such file or directory linux-pouh:/usr/local/openmpi-1.2.8 # linux-pouh:/usr/local/openmpi-1.2.8 # cd /usr/local/lib linux-pouh:/usr/local/lib # ls libmca_common_sm.lalibmpi_cxx.solibmpi_f77.so.0 libmpi.so.0.0.0 libopen-rte.la libmca_common_sm.solibmpi_cxx.so.0 libmpi_f77.so.0.0.0 libopen-pal.lalibopen-rte.so libmca_common_sm.so.0 libmpi_cxx.so.0.0.0 libmpi.la libopen-pal.solibopen-rte.so.0 libmca_common_sm.so.0.0.0 libmpi_f77.lalibmpi.so libopen-pal.so.0 libopen-rte.so.0.0.0 libmpi_cxx.la libmpi_f77.solibmpi.so.0 libopen-pal.so.0.0.0 openmpi