Re: [OMPI users] mpirun hangs: "hello" test in single machine
In fact we should have restrictive firewall settings, as long as I remember. I will check the rules again tomorrow morning. That's very interesting, I would expect such kind of problem if I were working with a cluster, but I haven't thought that it might lead also to problems for the internal communication in the machine. Thanks, Ralph. I'll let you know if this was the actual reason of the problem. Rodrigo On 04/10/2013 09:46 PM, Ralph Castain wrote: Best guess is that there is some issue with getting TCP sockets on the system - once the procs are launched, they need to open a TCP socket and communicate back to mpirun. If the socket is "stuck" waiting to complete the open, things will hang. You might check to ensure there isn't some security setting in place that protects sockets - something like iptables, for example. On Apr 10, 2013, at 11:57 AM, Rodrigo Gómez Vázquezwrote: Hi, I am having troubles with the program in a simulation server. The system consists of several processors but all in the same node (more information of the specs. is in the attachments). The system is quite new (few months) and a user reported me that it was not possible to run simulations on multiple processors in parallel. We are using it for CFD-Simulations with OpenFOAM, which comes along with an own 1.5.3-version of OpenMPI (for more details you can look inside the "ThirdParty software folder" following this link: http://www.openfoam.org/archive/2.1.1/download/source.php). The OS is an Ubuntu 12.04 Server distro (see uname.out in the attachments). He tried to start a simulation in parallel using the following command: ~: mpirun -np 4 As a result the simulation does not start and there is no error message. It looks like the program is just waiting/looking for something. We can see shortly the 4 processes with their PIDs in the "top" processes list, but only for few tenths of second and with 0% use of CPU and 0.0% use of memory as well. In order to recover the command terminal we have to kill the process. The same happens with the "hello" scripts that come along with the OpenMPI's sources: :~$mpicc hello_c.c -o hello :~$mpirun -np 4 hello ... and here it hangs again. I tried to execute other simpler processes, as recommended to check the installation. Let's see: :~$mpirun -np 4 hostname simserver simserver simserver simserver :~$ Works, as well as "ompi_info" does. Since we use the same OpenFOAM version without problems in several computers over ubuntu-based distros, I supposed that there must be any kind of incompatibility problem, due to the hardware, but... Anyway, I repeated the tests with the OpenMPI version from the ubuntu repositories (1.4.3) and got the same result. It would be wonderful if anyone could give me a hint. I am afraid, it may result a complicated issue, so please, let me know whatever relevant information missing. Thanks in advance, guys Rodrigo (Europe, GMT+2:00) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpirun hangs: "hello" test in single machine
Best guess is that there is some issue with getting TCP sockets on the system - once the procs are launched, they need to open a TCP socket and communicate back to mpirun. If the socket is "stuck" waiting to complete the open, things will hang. You might check to ensure there isn't some security setting in place that protects sockets - something like iptables, for example. On Apr 10, 2013, at 11:57 AM, Rodrigo Gómez Vázquezwrote: > Hi, > > I am having troubles with the program in a simulation server. > The system consists of several processors but all in the same node (more > information of the specs. is in the attachments). > The system is quite new (few months) and a user reported me that it was not > possible to run simulations on multiple processors in parallel. > We are using it for CFD-Simulations with OpenFOAM, which comes along with an > own 1.5.3-version of OpenMPI (for more details you can look inside the > "ThirdParty software folder" following this link: > http://www.openfoam.org/archive/2.1.1/download/source.php). The OS is an > Ubuntu 12.04 Server distro (see uname.out in the attachments). > He tried to start a simulation in parallel using the following command: > > ~: mpirun -np 4 > > As a result the simulation does not start and there is no error message. It > looks like the program is just waiting/looking for something. We can see > shortly the 4 processes with their PIDs in the "top" processes list, but only > for few tenths of second and with 0% use of CPU and 0.0% use of memory as > well. In order to recover the command terminal we have to kill the process. > > The same happens with the "hello" scripts that come along with the OpenMPI's > sources: > > :~$mpicc hello_c.c -o hello > :~$mpirun -np 4 hello > ... and here it hangs again. > > I tried to execute other simpler processes, as recommended to check the > installation. Let's see: > > :~$mpirun -np 4 hostname > simserver > simserver > simserver > simserver > :~$ > > Works, as well as "ompi_info" does. > > Since we use the same OpenFOAM version without problems in several computers > over ubuntu-based distros, I supposed that there must be any kind of > incompatibility problem, due to the hardware, but... > > Anyway, I repeated the tests with the OpenMPI version from the ubuntu > repositories (1.4.3) and got the same result. > > It would be wonderful if anyone could give me a hint. > > I am afraid, it may result a complicated issue, so please, let me know > whatever relevant information missing. > > Thanks in advance, guys > > Rodrigo (Europe, GMT+2:00) > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] mpirun hangs: "hello" test in single machine
Hi, I am having troubles with the program in a simulation server. The system consists of several processors but all in the same node (more information of the specs. is in the attachments). The system is quite new (few months) and a user reported me that it was not possible to run simulations on multiple processors in parallel. We are using it for CFD-Simulations with OpenFOAM, which comes along with an own 1.5.3-version of OpenMPI (for more details you can look inside the "ThirdParty software folder" following this link: http://www.openfoam.org/archive/2.1.1/download/source.php). The OS is an Ubuntu 12.04 Server distro (see uname.out in the attachments). He tried to start a simulation in parallel using the following command: ~: mpirun -np 4 As a result the simulation does not start and there is no error message. It looks like the program is just waiting/looking for something. We can see shortly the 4 processes with their PIDs in the "top" processes list, but only for few tenths of second and with 0% use of CPU and 0.0% use of memory as well. In order to recover the command terminal we have to kill the process. The same happens with the "hello" scripts that come along with the OpenMPI's sources: :~$mpicc hello_c.c -o hello :~$mpirun -np 4 hello ... and here it hangs again. I tried to execute other simpler processes, as recommended to check the installation. Let's see: :~$mpirun -np 4 hostname simserver simserver simserver simserver :~$ Works, as well as "ompi_info" does. Since we use the same OpenFOAM version without problems in several computers over ubuntu-based distros, I supposed that there must be any kind of incompatibility problem, due to the hardware, but... Anyway, I repeated the tests with the OpenMPI version from the ubuntu repositories (1.4.3) and got the same result. It would be wonderful if anyone could give me a hint. I am afraid, it may result a complicated issue, so please, let me know whatever relevant information missing. Thanks in advance, guys Rodrigo (Europe, GMT+2:00) openmpi1.4.3_ompi_info.out.bz2 Description: application/bzip Linux simserver 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux cat_-proc-cpuinfo.out.bz2 Description: application/bzip
Re: [OMPI users] Segmentation fault with HPCC benchmark
Dear Gus Correa, Thank you in advance for your detailed answer. I was busy checking your steps. But unfortunately still I have the problem. 1) Yes, I have sudo access to server, when I want to run the test only my two instances are active. 2) There is no problem with running hello program simultaneously on two instances, but someone told me these programs cannot check some factors. Instances are pure installation of ubuntu server 12.04, by the way I disabled "ufw". There are two notes here, openmpi uses ssh and I can connect with no password from master to slave. And one more odd thing is that the order is important in myhosts file, ie, allways the second machine abort the process, even when I am in the master and master is second in the file, it reports that master aborted. 3,4) I checked it, actually, I did everything from the first step. Just installing Atlas and OpenMPI from packages with 64 switch to configure. 5) I used -np 4 with hello, is this sufficient? 6) Yes, I checked auto-tuning (without input file) too. One thing that I noticed is that a "vnet" created for each instance on the main server. I ran these two commands: mirun -np 2 --hostfile myhosts --mca btl_tcp_if_include eth0,lo hpcc mirun -np 2 --hostfile myhosts --mca btl_tcp_if_exclude vnet0,vnet1 hpcc in this case I didn't receive anything, ie, no error nor anything in output file, I waited for hours but nothing happened. can these vnets cause the problem? Really Thank you for your consideration, Best Regards, Reza
Re: [OMPI users] Is Open MPI 1.6.4 viable on Mac OS X 10.6.8 ?
Hi Gus I feel your pain - that's a pretty old system! I obviously don't have any way to test it, but try configuring OMPI --without-memory-manager and see if that helps. On Apr 9, 2013, at 9:33 PM, Gustavo Correawrote: > Dear Open MPI Pros > > Somehow I am stuck offsite and I have to test/develop an MPI program on a > super duper > 2006 vintage Mac PowerBookPro with Mac OS X 10.6.8 (Snow Leopard). > This is a 32-bit machine with dual core Intel Core Duo processors and 2GB RAM. > > Well, my under-development program using FFTW3 and OMPI 1.6.4 runs > flawlessly on Linux, but I am offsite and I have to use the darn Mac, > where I get all sorts of weird errors out of the blue, which are > very likely to be associated to the Mac OS X underlying memory management > system. > > I say so because the OMPI test programs (connectivity_c.c, etc), which do NOT > allocate memory (other than the MPI internal buffers, if so), run correctly, > but once I > start using dynamic memory arrays, boomer, it breaks (but only on the Mac). > > I enclose below one of the error messages, FYI. > [It shows up as a segfault, but the array and buffer boundaries are correct, > and the program runs perfectly on Linux. RAM is OK also, my batch of test > data is small. No automatic arrays on the code either.] > > I read the OMPI FAQ on runtime issues, and a couple of them mention trouble > for OMPI > with the Mac OS X memory management scheme. However, those FAQ are quite old, > refer to OMPI 1.2 and 1.3 series only, recommend linking to an OMPI library > that seems to have been phased out (-lopenmpi-malloc), and didn't shed the > light > I was hoping for. > > So, before I give this effort up as not viable, here are a few questions: > > Are there specific recommendations on how to build OMPI 1.6.4 on Mac OS X > 1.6.8? > Are there any additional linker flags that should be used to build OMPI > applications under OS X? > Are there any runtime options that should be added to mpiexec to make OMPI > programs > that allocate memory dynamically to run correctly on Mac OS X? > > Thank you, > Gus Correa > Error message > * > [1,0]:[Macintosh-72:36578] *** Process received signal *** > [1,0]:[Macintosh-72:36578] Signal: Segmentation fault (11) > [1,0]:[Macintosh-72:36578] Signal code: Address not mapped (1) > [1,0]:[Macintosh-72:36578] Failing at address: 0x6648000 > [1,0]:[Macintosh-72:36578] [ 0] 2 libSystem.B.dylib > 0x9728c05b _sigtramp + 43 > [1,0]:[Macintosh-72:36578] [ 1] 3 ??? > 0x 0x0 + 4294967295 > [1,0]:[Macintosh-72:36578] [ 2] 4 wcdp3d > 0x0001be49 main + 1864 > [1,0]:[Macintosh-72:36578] [ 3] 5 wcdp3d > 0x27ad start + 53 > [1,0]:[Macintosh-72:36578] [ 4] 6 ??? > 0x0002 0x0 + 2 > [1,0]:[Macintosh-72:36578] *** End of error message *** > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Greenplum's MR+
Hi, I am currently testing OpenMPI applications running on SLURM and I wanted to test the MapReduce integration but I have trouble finding the actual MR+ package. Namely, both on your and SLURM there are some informations about MR+, some of them relatively old, but I could not find any actual software package being available. I have been to Greenplum's web page but their MR product is some remix of MapR, not the MR+ on OpenMPI. Could anyone tell me was MR+ released already, and if so, where it is? If not, how can I get early adopter's version? Many thanks for the help. Cheers, Jakub
[OMPI users] Is Open MPI 1.6.4 viable on Mac OS X 10.6.8 ?
Dear Open MPI Pros Somehow I am stuck offsite and I have to test/develop an MPI program on a super duper 2006 vintage Mac PowerBookPro with Mac OS X 10.6.8 (Snow Leopard). This is a 32-bit machine with dual core Intel Core Duo processors and 2GB RAM. Well, my under-development program using FFTW3 and OMPI 1.6.4 runs flawlessly on Linux, but I am offsite and I have to use the darn Mac, where I get all sorts of weird errors out of the blue, which are very likely to be associated to the Mac OS X underlying memory management system. I say so because the OMPI test programs (connectivity_c.c, etc), which do NOT allocate memory (other than the MPI internal buffers, if so), run correctly, but once I start using dynamic memory arrays, boomer, it breaks (but only on the Mac). I enclose below one of the error messages, FYI. [It shows up as a segfault, but the array and buffer boundaries are correct, and the program runs perfectly on Linux. RAM is OK also, my batch of test data is small. No automatic arrays on the code either.] I read the OMPI FAQ on runtime issues, and a couple of them mention trouble for OMPI with the Mac OS X memory management scheme. However, those FAQ are quite old, refer to OMPI 1.2 and 1.3 series only, recommend linking to an OMPI library that seems to have been phased out (-lopenmpi-malloc), and didn't shed the light I was hoping for. So, before I give this effort up as not viable, here are a few questions: Are there specific recommendations on how to build OMPI 1.6.4 on Mac OS X 1.6.8? Are there any additional linker flags that should be used to build OMPI applications under OS X? Are there any runtime options that should be added to mpiexec to make OMPI programs that allocate memory dynamically to run correctly on Mac OS X? Thank you, Gus Correa Error message * [1,0]:[Macintosh-72:36578] *** Process received signal *** [1,0]:[Macintosh-72:36578] Signal: Segmentation fault (11) [1,0]:[Macintosh-72:36578] Signal code: Address not mapped (1) [1,0]:[Macintosh-72:36578] Failing at address: 0x6648000 [1,0]:[Macintosh-72:36578] [ 0] 2 libSystem.B.dylib 0x9728c05b _sigtramp + 43 [1,0]:[Macintosh-72:36578] [ 1] 3 ??? 0x 0x0 + 4294967295 [1,0]:[Macintosh-72:36578] [ 2] 4 wcdp3d 0x0001be49 main + 1864 [1,0]:[Macintosh-72:36578] [ 3] 5 wcdp3d 0x27ad start + 53 [1,0]:[Macintosh-72:36578] [ 4] 6 ??? 0x0002 0x0 + 2 [1,0]:[Macintosh-72:36578] *** End of error message ***
Re: [OMPI users] mpirun error
Hello, thanks for the responses. But I have no idea how to do that. Which environment variables should I look at? How do I find out where is the openMPI installed and make the mpif90 use the openMPI? Thanks, Pradeep 2013/4/2 Elken, Tom> > The Intel Fortran 2013 compiler comes with support for Intel's MPI > runtime and > > you are getting that instead of OpenMPI. You need to fix your path for > all the > > shells you use. > [Tom] > Agree with Michael, but thought I would note something additional. > If you are using OFED's mpi-selector to select Open MPI, it will set up > the path to Open MPI before a startup script like .bashrc gets processed. > So if you source the Intel Compiler's compilervars.sh, you will get > Intel's mpirt in your path before the OpenMPI's bin directory. > > One workaround is to source the following _after_ you source the Intel > Compiler's compilervars.sh in your start-up scripts: > . /var/mpi-selector/data/openmpi_...sh > > -Tom > > > > > On Apr 1, 2013, at 5:12 AM, Pradeep Jha wrote: > > > > > /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun: line 96: > > /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh: No such > file > > or directory > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >