I'm afraid there isn't enough info here to advise - I don't know which poll is failing. What function is calling poll?
Could be a problem with the event library, but I don't know. Have you tried using "-mca btl sm,self" instead of tcp? On Jul 1, 2011, at 2:37 PM, Colon, Joseanibal wrote: > I got the LD_LIBRARY_PATH correct and I don’t have other installations on the > target machine, but it doesn’t fix it. I had the suspicion about > “./configure” building support for stuff on my machine that is not available > on the target machine. Unfortunately the machines are not exactly identical, > definitely in terms of hardware. The only similarities are the OS and the > x86_64 architecture (this is OpenSUSE 11, SP1). > As you correctly guessed I want to run this on a single machine, and all > processes are local. There is some intercommunication going on as well, but > all using MPI API. I am guessing that my problem has to do with > intercommunications (since strace shows infinite calls to ‘poll()’), probably > because mpirun is trying to use features that were configured on my machine > but not present on the target. Does that make sense? > I figured I don’t need any fancy support to just run a couple of processes in > parallel locally. What would be the most basic configuration I can use to > ensure that this will run on my target machine? (a machine that probably > doesn’t have support for a lot of the components – no IB devices found). I > want openmpi to use the simplest form available. Thanks! > > -Joseanibal > > > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Friday, July 01, 2011 3:50 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] Question about hanging mpirun > > Make sure your LD_LIBRARY_PATH will pickup this installation before anything > else - it's possible it is picking up an old one. > > I take it that you are running this on a single machine? So all the procs are > local? > > Only other issue is that OMPI's configure does a lot of testing to detect the > local environment. So you might be building support for things that aren't on > your target machine, and vice versa. If you have to do it this way, you need > to ensure that the two machines are absolutely identical, both in hardware > and software (watch for those installed packages!). > > > On Jul 1, 2011, at 10:42 AM, Colon, Joseanibal wrote: > > > My mpi application is hanging forever when called with mpirun –np >1 (that is > 2 or more... not actually typing the ‘>’). > > So I built openmpi 1.4.3 with default options except I used > –prefix=/usr/local/openmpi. I compiled an application against it but I need > to run this application elsewhere. So brought in my entire installation > directory /usr/local/openmpi to this new machine along with my binary to test > it. Ran the following command... (If i did’t use the –mca options it would > print out messages about missing OpenFrabric): > /usr/local/openmpi/bin/mpirun --mca btl tcp,self -np 2 ./my_application > > This actually works for –np 1. But requesting another process makes the call > hang forever. ‘strace’ of the above call shows an never ending calls to > “poll” resulting in (timeout) every time. > Executing /usr/local/openmpi/bin/ompi_info still shows the configure and > build host as the machine I built on, but I don’t know if this may cause a > problem. I also see “Thread support: posix (mpi: no, progress: no)” > > Unfortunately I need to do it this way.. I cannot build openmpi on the target > machine, so I need to make it portable. This other machine should be the same > architecture and OS and everything. > > I should have solved this yesterday, please help, and thanks! > > -Joseanibal > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel