[OMPI devel] AlphaServers & OpenMPI

2007-05-08 Thread Rob

Hi,

What is the problem with supporting AlphaServers in
OpenMPI?

The alternatives, MPICH1 (very old) supports
AlphaServers; and MPICH2 (new) appears to work on
AlphaServers too (but setting up MPICH2 with the
mpd ring is just too complicated).

Hence, I would prefer OpenMPI instead.
Is there a way to get OpenMPI work on my AlphaSystems?

Thanks,
Rob.








Food fight? Enjoy some healthy debate 
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367


[OMPI devel] 1.3.1 version/subversion discrepancy

2009-04-14 Thread Rob Egan

Hello,

I just installed Open MPI 1.3.1 and found that the following assertion
now fails.

 MPI_Get_version(&version, &subversion);
 Assert(version == MPI_VERSION && subversion == MPI_SUBVERSION);


This is an excerpt from pyMPI, which I have been using with Open MPI 1.2.7.

According to mpi.h, MPI_VERSION == 2 and MPI_SUBVERSION == 1, but the
procedure MPI_Get_version returns 2 and 0 for version & subversion
respectively.

I think this is a quick fix to sync up the mpi.h and get_version.c.

Thanks,
Rob Egan



Re: [OMPI devel] [mpich-discuss] ROMIO+Lustre problems in OpenMPI 1.8.3

2014-10-28 Thread Rob Latham



On 10/28/2014 06:00 AM, Paul Kapinos wrote:

Dear Open MPI and ROMIO developer,

We use Open MPI v.1.6.x and 1.8.x in our cluster.
We have Lustre file system; we wish to use MPI_IO.
So the OpenMPI's are compiled with this flag:
 > --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'

In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*.

Short seek for root of the evil bring the following to light:

- the ROMIO component 'MCA io: romio' isn't here at all in the affected
version, because

- configure of ROMIO has *failed* (cf. logs (a,b,c).
- because lustre_user.h was found but could not be compiled.


lustre_user.h cannot be compiled because quota defines won't compile. 
Ugh, what a mess.


A while back I noticed this and fixed it by removing an XOPEN_SOURCE 
feature test macro:


http://trac.mpich.org/projects/mpich/ticket/1973

Then, on solaris with --enable-strict we needed to put *back* the 
XOPEN_SOURCE macro or else pread and pwrite would be undefined.


So what I really need to to is delete XOPEN_SOURCE since it causes such 
headaches, and on the rare platforms that only have pread/pwrite defined 
if you take extraordinary measures, if at all, I'll have a ROMIO pread 
and pwrite that simply do seek + write (or read).


For now, please delete the XOPEN_SOURCE line at the very beginning of 
src/mpi/romio/adio/ad_lustre/ad_lustre_rwcontig.c


==rob





In our system, there are two lustre_user.h available:
$ locate lustre_user.h
/usr/include/linux/lustre_user.h
/usr/include/lustre/lustre_user.h
As I'm not very convinient with lustre, I just attach both of them.

pk224850@cluster:~[509]$ uname -a
Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue
Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux

pk224850@cluster:~[510]$ cat /etc/issue
Scientific Linux release 6.5 (Carbon)

Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our
environment.

Best

Paul Kapinos

P.S. Is there a confugure flag, which will enforce ROMIO? That is when
ROMIO not available, configure would fail. This would make such hidden
errors publique at installation time..






a) Log in Open MPI's config.log:
--

configure:226781: OMPI configuring in ompi/mca/io/romio/romio
configure:226866: running /bin/sh './configure'
--with-file-system=testfs+ufs+nfs+lustre  FROM_OMPI=yes CC="icc
-std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2
-m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions
-Qoption,cpp,--extended_float_types -pthread" CPPFLAGS="
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include"
FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64  "
LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64
-fexceptions " --enable-shared --disable-static
--with-file-system=testfs+ufs+nfs+lustre
--prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio
--cache-file=/dev/null --srcdir=. --disable-option-checking
configure:226876: /bin/sh './configure' *failed* for
ompi/mca/io/romio/romio
configure:226911: WARNING: ROMIO distribution did not configure
successfully
configure:227425: checking if MCA component io:romio can compile
configure:227427: result: no
--




b) dump of Open MPI's 'configure' output to the console:
--

checking lustre/lustre_user.h usability... no
checking lustre/lustre_user.h presence... yes
configure: WARNING: lustre/lustre_user.h: present but cannot be compiled
configure: WARNING: lustre/lustre_user.h: check for missing
prerequisite headers?
configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation
configure: WARNING: lustre/lustre_user.h: section "Present But
Cannot Be Compiled"
configure: WARNING: lustre/lustre_user.h: proceeding with the compiler's
result
configure: WARNING: ##  ##
configure: WARNING: ## Report this to disc...@mpich.org ##
configure: WARNING: ##  ##
checking for lustre/lustre_user.h... no
configure: error: LUSTRE support requested but cannot find
lustre/lustre_user.h header file
configure: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio
configure: WARNING: ROMIO distribution did not configure successfully
checking if MCA component io:romio can compile... no
--


c) o

Re: [OMPI devel] [mpich-discuss] BUG in ADIOI_NFS_WriteStrided

2014-12-19 Thread Rob Latham



On 12/19/2014 02:33 PM, Eric Chamberland wrote:

Hi Howard,

the bug is present also with MPICH-3.1.3...

So, for disc...@mpich.org list readers, here is the valgrind output for
a bug revealed with valgrind (sorry, I didn't compiled MPICH in debug
mode),  reported to OpenMPI earlier today:
http://www.open-mpi.org/community/lists/devel/2014/12/16691.php

sorry for the "duplicated" report again.

I encountered a new bug while testing our collective MPI I/O
functionnalities over NFS.  This is not a big issue for us, but I think
someone should have a look at it.


Please don't use NFS for MPI-IO.  ROMIO makes a best effort but there's 
no way to guarantee you won't corrupt a block of data (NFS clients are 
allowed to cache... arbitrarily, it seems).  There are so many good 
parallel file systems with saner consistency semantics .


This looks like maybe a calloc would clean it right up.

==rob



While running at 3 processes, we have this error on rank #0 and rank #2,
knowing that rank #1 have nothing to write (0 length size) on this
particular PMPI_File_write_all_begin call:

==3434== Syscall param write(buf) points to uninitialised byte(s)
==3434==at 0x108D0380: __write_nocancel (in /lib64/libpthread-2.17.so)
==3434==by 0x11DB9D46: ADIOI_NFS_WriteStrided (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB60F3: PMPI_File_write_all_begin (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x4CCAE6: SYEnveloppeMessage
PAIO::ecritureIndexeParBlocMPI, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType,
std::vector*, std::allocator*> > const>, FunctorAccesseurPorteurLocal > >(PAGroupeProcessus&, ADIOI_FileD*, long long,
PtrPorteurConst, PtrPorteurConst,
FunctorCopieInfosSurDansVectPAType,
std::vector*, std::allocator*> > const>&, FunctorAccesseurPorteurLocal >&, long, DistributionComposantes&, long, unsigned long, unsigned
long, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x4DDBFE:
GISLectureEcriture::visiteMaillage(Maillage const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x4BCB22:
GISLectureEcriture::ecritGISMPI(std::string,
GroupeInfoSur const&, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x48E213: main (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==  Address 0x1a12cd10 is 224 bytes inside a block of size 524,448
alloc'd
==3434==at 0x4C2C27B: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3434==by 0x11DADA96: MPL_trmalloc (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DD9285: ADIOI_Malloc_fn (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB9AC8: ADIOI_NFS_WriteStrided (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB60F3: PMPI_File_write_all_begin (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x4CCAE6: SYEnveloppeMessage
PAIO::ecritureIndexeParBlocMPI, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType,
std::vector*, std::allocator*> > const>, FunctorAccesseurPorteurLocal > >(PAGroupeProcessus&, ADIOI_FileD*, long long,
PtrPorteurConst, PtrPorteurConst,
FunctorCopieInfosSurDansVectPAType,
std::vector*, std::allocator*> > const>&, FunctorAccesseurPorteurLocal >&, long, DistributionComposantes&, long, unsigned long, unsigned
long, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x4DDBFE:
GISLectureEcriture::visiteMaillage(Maillage const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x4BCB22:
GISLectureEcriture::ecritGISMPI(std::string,
GroupeInfoSur const&, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==by 0x48E213: main (in
/home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt)
==3434==  Uninitialised value was created by a client request
==3434==at 0x11DADEE5: MPL_trmalloc (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DD9285: ADIOI_Malloc_fn (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB9AC8: ADIOI_NFS_WriteStrided (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in
/opt/mpich-3.1.3/lib64/libmpi.so.12.0.4)
==3434==by 0x11DB60F3: PMPI_File

Re: [OMPI devel] [mpich-discuss] ROMIO+Lustre problems in OpenMPI 1.8.3

2015-02-26 Thread Rob Latham



On 11/07/2014 06:26 AM, Ralph Castain wrote:

Hi Rob

Following up on this: I cannot find any reference to XOPEN_SOURCE in our 
included ROMIO source for Lustre. I only found one reference anywhere in ROMIO:

romio/adio/ad_xfs/ad_xfs.h:11:#define _XOPEN_SOURCE 500

Any other suggestions on what could be causing the problem?


I've fixed this in ROMIO by not mucking around with XOPEN_SOURCE at all, 
in either lustre or xfs or anywhere.


http://git.mpich.org/mpich.git/commit/4e80e1d2b
and
http://git.mpich.org/mpich.git/commit/5a10283bf7
==rob



Thanks
Ralph



On Oct 28, 2014, at 7:32 AM, Rob Latham  wrote:



On 10/28/2014 06:00 AM, Paul Kapinos wrote:

Dear Open MPI and ROMIO developer,

We use Open MPI v.1.6.x and 1.8.x in our cluster.
We have Lustre file system; we wish to use MPI_IO.
So the OpenMPI's are compiled with this flag:

--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'


In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*.

Short seek for root of the evil bring the following to light:

- the ROMIO component 'MCA io: romio' isn't here at all in the affected
version, because

- configure of ROMIO has *failed* (cf. logs (a,b,c).
- because lustre_user.h was found but could not be compiled.


lustre_user.h cannot be compiled because quota defines won't compile. Ugh, what 
a mess.

A while back I noticed this and fixed it by removing an XOPEN_SOURCE feature 
test macro:

http://trac.mpich.org/projects/mpich/ticket/1973

Then, on solaris with --enable-strict we needed to put *back* the XOPEN_SOURCE 
macro or else pread and pwrite would be undefined.

So what I really need to to is delete XOPEN_SOURCE since it causes such 
headaches, and on the rare platforms that only have pread/pwrite defined if you 
take extraordinary measures, if at all, I'll have a ROMIO pread and pwrite that 
simply do seek + write (or read).

For now, please delete the XOPEN_SOURCE line at the very beginning of 
src/mpi/romio/adio/ad_lustre/ad_lustre_rwcontig.c

==rob





In our system, there are two lustre_user.h available:
$ locate lustre_user.h
/usr/include/linux/lustre_user.h
/usr/include/lustre/lustre_user.h
As I'm not very convinient with lustre, I just attach both of them.

pk224850@cluster:~[509]$ uname -a
Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue
Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux

pk224850@cluster:~[510]$ cat /etc/issue
Scientific Linux release 6.5 (Carbon)

Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our
environment.

Best

Paul Kapinos

P.S. Is there a confugure flag, which will enforce ROMIO? That is when
ROMIO not available, configure would fail. This would make such hidden
errors publique at installation time..






a) Log in Open MPI's config.log:
--

configure:226781: OMPI configuring in ompi/mca/io/romio/romio
configure:226866: running /bin/sh './configure'
--with-file-system=testfs+ufs+nfs+lustre  FROM_OMPI=yes CC="icc
-std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2
-m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions
-Qoption,cpp,--extended_float_types -pthread" CPPFLAGS="
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent
-I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include"
FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64  "
LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2   -m64
-fexceptions " --enable-shared --disable-static
--with-file-system=testfs+ufs+nfs+lustre
--prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio
--cache-file=/dev/null --srcdir=. --disable-option-checking
configure:226876: /bin/sh './configure' *failed* for
ompi/mca/io/romio/romio
configure:226911: WARNING: ROMIO distribution did not configure
successfully
configure:227425: checking if MCA component io:romio can compile
configure:227427: result: no
--




b) dump of Open MPI's 'configure' output to the console:
--

checking lustre/lustre_user.h usability... no
checking lustre/lustre_user.h presence... yes
configure: WARNING: lustre/lustre_user.h: present but cannot be compiled
configure: WARNING: lustre/lustre_user.h: check for missing
prerequisite headers?
configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation
configure: WARNING: lustre/lustre_user.h: section "Present But
Cannot Be Compiled"
configure: WARNING: lustre/lustre_user.h: proceeding with the compiler