[OMPI users] Compilation Failure on Franklin with OpenMPI

2009-10-13 Thread Abhinav Vishnu

Folks:

I am trying to compile OpenMPI on Franklin, and after trying out a 
couple of options, I am still observing a compilation failure.


I am using the following options:

export CC=cc
./configure --with-portals-libs=/u0/v/vishnu/libportals.a --enable-shared=no

The default version of libportals seems to provide error, and there were 
compilation errors using shared library, so I disabled these two options 
as well. The error is here:


/opt/cray/xt-asyncpe/3.2/bin/cc: INFO: linux target is being used
pgcc-Error-Unknown switch: --export-dynamic
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory 
`/u0/v/vishnu/src/mpi/ompi/openmpi-1.3.3/opal/tools/wrappers'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/u0/v/vishnu/src/mpi/ompi/openmpi-1.3.3/opal'
make: *** [all-recursive] Error 1
franklin-nid00092[27] openmpi-1.3.3$ ./configure 
--with-portals-libs=/u0/v/vishnu/libportals.a --enable-shared=no


A compilation script for Franklin (if there is one) would be greatly 
appreciated.


thanks and best regards,

--
Abhinav Vishnu, Ph.D.
Research Scientist
High Performance Computing

Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN: K7-90
Richland, WA 99352 USA
Tel.: (509)372-4794
Fax: (509)375-2520
abhinav.vis...@pnl.gov
www.pnl.gov




Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband

2009-10-13 Thread Dorian Krause

Hi Edgar,

this sounds reasonable. Looking at the program with the debugger, I can 
see that 15/16 processes wait in PMPI_Allreduce whereas the other one is 
in PMPI_Wait.


Also, the program works with mvapich and I guess the ADIO source tree is 
more or less the same (correct me if I'm wrong)?!


So, I stick to MPI_File_write and wait for 1.3.4 ...

Thanks,
Dorian

Edgar Gabriel wrote:
I am wondering whether this is really due to the usage of 
File_write_all. We had a bug in in 1.3 series so far (which will be 
fixed in 1.3.4) where we lost message segments and thus had a deadlock 
in Comm_dup if there was communication occurring *right after* the 
Comm_dup. File_open executes a comm_dup internally.


If you replace write_all by write, you are avoiding the communication. 
If you replace ib by tcp, your entire timing is different and you 
might accidentally not see the deadlock...


Just my $0.02 ...

Thanks
Edgar

Dorian Krause wrote:

Dear list,

the attached program deadlocks in MPI_File_write_all when run with 16 
processes on two 8 core nodes of an Infiniband cluster. It runs fine 
when I


a) use tcp
or
b) replace MPI_File_write_all by MPI_File_write

I'm using openmpi V. 1.3.2 (but I checked that the problem is also 
occurs with version 1.3.3). The OFED version is 1.4 (installed via 
Rocks). The Operating system is CentOS 5.2


I compile with gcc-4.1.2. The openmpi configure flags are

  ../../configure --prefix=/share/apps/openmpi/1.3.2/gcc-4.1.2/ 
--with-io-romio-flags=--with-file-system=nfs+ufs+pvfs2 
--with-wrapper-ldflags=-L/share/apps/pvfs2/lib 
CPPFLAGS=-I/share/apps/pvfs2/include/ LDFLAGS=-L/share/apps/pvfs2/lib 
LIBS=-lpvfs2 -lpthread


The user home directories are mounted via nfs.

Is it a problem with the user code, the system or with openmpi?

Thanks,
Dorian




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






[OMPI users] bug in MPI_Cart_create?

2009-10-13 Thread David Singleton


Looking back through the archives, a lot of people have hit error
messages like

> [bl302:26556] *** An error occurred in MPI_Cart_create
> [bl302:26556] *** on communicator MPI_COMM_WORLD
> [bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind
> [bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

One of the reasons people *may* be hitting this is what I believe to
be an incorrect test in MPI_Cart_create():

if (0 > reorder || 1 < reorder) {
return OMPI_ERRHANDLER_INVOKE (old_comm, MPI_ERR_ARG,
  FUNC_NAME);
}

reorder is a "logical" argument and "2.5.2 C bindings" in the MPI 1.3
standard says:

Logical flags are integers with value 0 meaning “false” and a
non-zero value meaning “true.”

So I'm not sure there should be any argument test.


We hit this because we (sorta erroneously) were trying to use a GNU build
of Open MPI with Intel compilers.  gfortran has true=1 while ifort has
true=-1.  It seems to all work (by luck, I know) except this test.  Are
there any other tests like this in Open MPI?

David