[OMPI users] Compilation Failure on Franklin with OpenMPI
Folks: I am trying to compile OpenMPI on Franklin, and after trying out a couple of options, I am still observing a compilation failure. I am using the following options: export CC=cc ./configure --with-portals-libs=/u0/v/vishnu/libportals.a --enable-shared=no The default version of libportals seems to provide error, and there were compilation errors using shared library, so I disabled these two options as well. The error is here: /opt/cray/xt-asyncpe/3.2/bin/cc: INFO: linux target is being used pgcc-Error-Unknown switch: --export-dynamic make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/u0/v/vishnu/src/mpi/ompi/openmpi-1.3.3/opal/tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/u0/v/vishnu/src/mpi/ompi/openmpi-1.3.3/opal' make: *** [all-recursive] Error 1 franklin-nid00092[27] openmpi-1.3.3$ ./configure --with-portals-libs=/u0/v/vishnu/libportals.a --enable-shared=no A compilation script for Franklin (if there is one) would be greatly appreciated. thanks and best regards, -- Abhinav Vishnu, Ph.D. Research Scientist High Performance Computing Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, MSIN: K7-90 Richland, WA 99352 USA Tel.: (509)372-4794 Fax: (509)375-2520 abhinav.vis...@pnl.gov www.pnl.gov
Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband
Hi Edgar, this sounds reasonable. Looking at the program with the debugger, I can see that 15/16 processes wait in PMPI_Allreduce whereas the other one is in PMPI_Wait. Also, the program works with mvapich and I guess the ADIO source tree is more or less the same (correct me if I'm wrong)?! So, I stick to MPI_File_write and wait for 1.3.4 ... Thanks, Dorian Edgar Gabriel wrote: I am wondering whether this is really due to the usage of File_write_all. We had a bug in in 1.3 series so far (which will be fixed in 1.3.4) where we lost message segments and thus had a deadlock in Comm_dup if there was communication occurring *right after* the Comm_dup. File_open executes a comm_dup internally. If you replace write_all by write, you are avoiding the communication. If you replace ib by tcp, your entire timing is different and you might accidentally not see the deadlock... Just my $0.02 ... Thanks Edgar Dorian Krause wrote: Dear list, the attached program deadlocks in MPI_File_write_all when run with 16 processes on two 8 core nodes of an Infiniband cluster. It runs fine when I a) use tcp or b) replace MPI_File_write_all by MPI_File_write I'm using openmpi V. 1.3.2 (but I checked that the problem is also occurs with version 1.3.3). The OFED version is 1.4 (installed via Rocks). The Operating system is CentOS 5.2 I compile with gcc-4.1.2. The openmpi configure flags are ../../configure --prefix=/share/apps/openmpi/1.3.2/gcc-4.1.2/ --with-io-romio-flags=--with-file-system=nfs+ufs+pvfs2 --with-wrapper-ldflags=-L/share/apps/pvfs2/lib CPPFLAGS=-I/share/apps/pvfs2/include/ LDFLAGS=-L/share/apps/pvfs2/lib LIBS=-lpvfs2 -lpthread The user home directories are mounted via nfs. Is it a problem with the user code, the system or with openmpi? Thanks, Dorian ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] bug in MPI_Cart_create?
Looking back through the archives, a lot of people have hit error messages like > [bl302:26556] *** An error occurred in MPI_Cart_create > [bl302:26556] *** on communicator MPI_COMM_WORLD > [bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind > [bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) One of the reasons people *may* be hitting this is what I believe to be an incorrect test in MPI_Cart_create(): if (0 > reorder || 1 < reorder) { return OMPI_ERRHANDLER_INVOKE (old_comm, MPI_ERR_ARG, FUNC_NAME); } reorder is a "logical" argument and "2.5.2 C bindings" in the MPI 1.3 standard says: Logical flags are integers with value 0 meaning “false” and a non-zero value meaning “true.” So I'm not sure there should be any argument test. We hit this because we (sorta erroneously) were trying to use a GNU build of Open MPI with Intel compilers. gfortran has true=1 while ifort has true=-1. It seems to all work (by luck, I know) except this test. Are there any other tests like this in Open MPI? David