[OMPI devel] AlphaServers & OpenMPI
Hi, What is the problem with supporting AlphaServers in OpenMPI? The alternatives, MPICH1 (very old) supports AlphaServers; and MPICH2 (new) appears to work on AlphaServers too (but setting up MPICH2 with the mpd ring is just too complicated). Hence, I would prefer OpenMPI instead. Is there a way to get OpenMPI work on my AlphaSystems? Thanks, Rob. Food fight? Enjoy some healthy debate in the Yahoo! Answers Food & Drink Q&A. http://answers.yahoo.com/dir/?link=list&sid=396545367
[OMPI devel] 1.3.1 version/subversion discrepancy
Hello, I just installed Open MPI 1.3.1 and found that the following assertion now fails. MPI_Get_version(&version, &subversion); Assert(version == MPI_VERSION && subversion == MPI_SUBVERSION); This is an excerpt from pyMPI, which I have been using with Open MPI 1.2.7. According to mpi.h, MPI_VERSION == 2 and MPI_SUBVERSION == 1, but the procedure MPI_Get_version returns 2 and 0 for version & subversion respectively. I think this is a quick fix to sync up the mpi.h and get_version.c. Thanks, Rob Egan
Re: [OMPI devel] [mpich-discuss] ROMIO+Lustre problems in OpenMPI 1.8.3
On 10/28/2014 06:00 AM, Paul Kapinos wrote: Dear Open MPI and ROMIO developer, We use Open MPI v.1.6.x and 1.8.x in our cluster. We have Lustre file system; we wish to use MPI_IO. So the OpenMPI's are compiled with this flag: > --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*. Short seek for root of the evil bring the following to light: - the ROMIO component 'MCA io: romio' isn't here at all in the affected version, because - configure of ROMIO has *failed* (cf. logs (a,b,c). - because lustre_user.h was found but could not be compiled. lustre_user.h cannot be compiled because quota defines won't compile. Ugh, what a mess. A while back I noticed this and fixed it by removing an XOPEN_SOURCE feature test macro: http://trac.mpich.org/projects/mpich/ticket/1973 Then, on solaris with --enable-strict we needed to put *back* the XOPEN_SOURCE macro or else pread and pwrite would be undefined. So what I really need to to is delete XOPEN_SOURCE since it causes such headaches, and on the rare platforms that only have pread/pwrite defined if you take extraordinary measures, if at all, I'll have a ROMIO pread and pwrite that simply do seek + write (or read). For now, please delete the XOPEN_SOURCE line at the very beginning of src/mpi/romio/adio/ad_lustre/ad_lustre_rwcontig.c ==rob In our system, there are two lustre_user.h available: $ locate lustre_user.h /usr/include/linux/lustre_user.h /usr/include/lustre/lustre_user.h As I'm not very convinient with lustre, I just attach both of them. pk224850@cluster:~[509]$ uname -a Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux pk224850@cluster:~[510]$ cat /etc/issue Scientific Linux release 6.5 (Carbon) Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our environment. Best Paul Kapinos P.S. Is there a confugure flag, which will enforce ROMIO? That is when ROMIO not available, configure would fail. This would make such hidden errors publique at installation time.. a) Log in Open MPI's config.log: -- configure:226781: OMPI configuring in ompi/mca/io/romio/romio configure:226866: running /bin/sh './configure' --with-file-system=testfs+ufs+nfs+lustre FROM_OMPI=yes CC="icc -std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread" CPPFLAGS=" -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include" FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 " LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -fexceptions " --enable-shared --disable-static --with-file-system=testfs+ufs+nfs+lustre --prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking configure:226876: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure:226911: WARNING: ROMIO distribution did not configure successfully configure:227425: checking if MCA component io:romio can compile configure:227427: result: no -- b) dump of Open MPI's 'configure' output to the console: -- checking lustre/lustre_user.h usability... no checking lustre/lustre_user.h presence... yes configure: WARNING: lustre/lustre_user.h: present but cannot be compiled configure: WARNING: lustre/lustre_user.h: check for missing prerequisite headers? configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation configure: WARNING: lustre/lustre_user.h: section "Present But Cannot Be Compiled" configure: WARNING: lustre/lustre_user.h: proceeding with the compiler's result configure: WARNING: ## ## configure: WARNING: ## Report this to disc...@mpich.org ## configure: WARNING: ## ## checking for lustre/lustre_user.h... no configure: error: LUSTRE support requested but cannot find lustre/lustre_user.h header file configure: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure: WARNING: ROMIO distribution did not configure successfully checking if MCA component io:romio can compile... no -- c) o
Re: [OMPI devel] [mpich-discuss] BUG in ADIOI_NFS_WriteStrided
On 12/19/2014 02:33 PM, Eric Chamberland wrote: Hi Howard, the bug is present also with MPICH-3.1.3... So, for disc...@mpich.org list readers, here is the valgrind output for a bug revealed with valgrind (sorry, I didn't compiled MPICH in debug mode), reported to OpenMPI earlier today: http://www.open-mpi.org/community/lists/devel/2014/12/16691.php sorry for the "duplicated" report again. I encountered a new bug while testing our collective MPI I/O functionnalities over NFS. This is not a big issue for us, but I think someone should have a look at it. Please don't use NFS for MPI-IO. ROMIO makes a best effort but there's no way to guarantee you won't corrupt a block of data (NFS clients are allowed to cache... arbitrarily, it seems). There are so many good parallel file systems with saner consistency semantics . This looks like maybe a calloc would clean it right up. ==rob While running at 3 processes, we have this error on rank #0 and rank #2, knowing that rank #1 have nothing to write (0 length size) on this particular PMPI_File_write_all_begin call: ==3434== Syscall param write(buf) points to uninitialised byte(s) ==3434==at 0x108D0380: __write_nocancel (in /lib64/libpthread-2.17.so) ==3434==by 0x11DB9D46: ADIOI_NFS_WriteStrided (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB60F3: PMPI_File_write_all_begin (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x4CCAE6: SYEnveloppeMessage PAIO::ecritureIndexeParBlocMPI, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType, std::vector*, std::allocator*> > const>, FunctorAccesseurPorteurLocal > >(PAGroupeProcessus&, ADIOI_FileD*, long long, PtrPorteurConst, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType, std::vector*, std::allocator*> > const>&, FunctorAccesseurPorteurLocal >&, long, DistributionComposantes&, long, unsigned long, unsigned long, std::string const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x4DDBFE: GISLectureEcriture::visiteMaillage(Maillage const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x4BCB22: GISLectureEcriture::ecritGISMPI(std::string, GroupeInfoSur const&, std::string const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x48E213: main (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434== Address 0x1a12cd10 is 224 bytes inside a block of size 524,448 alloc'd ==3434==at 0x4C2C27B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3434==by 0x11DADA96: MPL_trmalloc (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DD9285: ADIOI_Malloc_fn (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB9AC8: ADIOI_NFS_WriteStrided (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB60F3: PMPI_File_write_all_begin (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x4CCAE6: SYEnveloppeMessage PAIO::ecritureIndexeParBlocMPI, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType, std::vector*, std::allocator*> > const>, FunctorAccesseurPorteurLocal > >(PAGroupeProcessus&, ADIOI_FileD*, long long, PtrPorteurConst, PtrPorteurConst, FunctorCopieInfosSurDansVectPAType, std::vector*, std::allocator*> > const>&, FunctorAccesseurPorteurLocal >&, long, DistributionComposantes&, long, unsigned long, unsigned long, std::string const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x4DDBFE: GISLectureEcriture::visiteMaillage(Maillage const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x4BCB22: GISLectureEcriture::ecritGISMPI(std::string, GroupeInfoSur const&, std::string const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434==by 0x48E213: main (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) ==3434== Uninitialised value was created by a client request ==3434==at 0x11DADEE5: MPL_trmalloc (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DD9285: ADIOI_Malloc_fn (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB9AC8: ADIOI_NFS_WriteStrided (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DD264F: ADIOI_GEN_WriteStridedColl (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB5F8F: MPIOI_File_write_all_begin (in /opt/mpich-3.1.3/lib64/libmpi.so.12.0.4) ==3434==by 0x11DB60F3: PMPI_File
Re: [OMPI devel] [mpich-discuss] ROMIO+Lustre problems in OpenMPI 1.8.3
On 11/07/2014 06:26 AM, Ralph Castain wrote: Hi Rob Following up on this: I cannot find any reference to XOPEN_SOURCE in our included ROMIO source for Lustre. I only found one reference anywhere in ROMIO: romio/adio/ad_xfs/ad_xfs.h:11:#define _XOPEN_SOURCE 500 Any other suggestions on what could be causing the problem? I've fixed this in ROMIO by not mucking around with XOPEN_SOURCE at all, in either lustre or xfs or anywhere. http://git.mpich.org/mpich.git/commit/4e80e1d2b and http://git.mpich.org/mpich.git/commit/5a10283bf7 ==rob Thanks Ralph On Oct 28, 2014, at 7:32 AM, Rob Latham wrote: On 10/28/2014 06:00 AM, Paul Kapinos wrote: Dear Open MPI and ROMIO developer, We use Open MPI v.1.6.x and 1.8.x in our cluster. We have Lustre file system; we wish to use MPI_IO. So the OpenMPI's are compiled with this flag: --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' In our newest installation openmpi/1.8.3 we found that MPI_IO is *broken*. Short seek for root of the evil bring the following to light: - the ROMIO component 'MCA io: romio' isn't here at all in the affected version, because - configure of ROMIO has *failed* (cf. logs (a,b,c). - because lustre_user.h was found but could not be compiled. lustre_user.h cannot be compiled because quota defines won't compile. Ugh, what a mess. A while back I noticed this and fixed it by removing an XOPEN_SOURCE feature test macro: http://trac.mpich.org/projects/mpich/ticket/1973 Then, on solaris with --enable-strict we needed to put *back* the XOPEN_SOURCE macro or else pread and pwrite would be undefined. So what I really need to to is delete XOPEN_SOURCE since it causes such headaches, and on the rare platforms that only have pread/pwrite defined if you take extraordinary measures, if at all, I'll have a ROMIO pread and pwrite that simply do seek + write (or read). For now, please delete the XOPEN_SOURCE line at the very beginning of src/mpi/romio/adio/ad_lustre/ad_lustre_rwcontig.c ==rob In our system, there are two lustre_user.h available: $ locate lustre_user.h /usr/include/linux/lustre_user.h /usr/include/lustre/lustre_user.h As I'm not very convinient with lustre, I just attach both of them. pk224850@cluster:~[509]$ uname -a Linux cluster.rz.RWTH-Aachen.DE 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux pk224850@cluster:~[510]$ cat /etc/issue Scientific Linux release 6.5 (Carbon) Note that openmpi/1.8.1 seem to be fully OK (MPI_IO works) in our environment. Best Paul Kapinos P.S. Is there a confugure flag, which will enforce ROMIO? That is when ROMIO not available, configure would fail. This would make such hidden errors publique at installation time.. a) Log in Open MPI's config.log: -- configure:226781: OMPI configuring in ompi/mca/io/romio/romio configure:226866: running /bin/sh './configure' --with-file-system=testfs+ufs+nfs+lustre FROM_OMPI=yes CC="icc -std=c99" CFLAGS="-DNDEBUG -O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -finline-functions -fno-strict-aliasing -restrict -fexceptions -Qoption,cpp,--extended_float_types -pthread" CPPFLAGS=" -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/hwloc/hwloc172/hwloc/include -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent -I/w0/tmp/pk224850/linuxc2_9713/openmpi-1.8.3_linux64_intel/opal/mca/event/libevent2021/libevent/include" FFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 " LDFLAGS="-O3 -ip -axAVX,SSE4.2,SSE4.1 -fp-model fast=2 -m64 -fexceptions " --enable-shared --disable-static --with-file-system=testfs+ufs+nfs+lustre --prefix=/opt/MPI/openmpi-1.8.3/linux/intel --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking configure:226876: /bin/sh './configure' *failed* for ompi/mca/io/romio/romio configure:226911: WARNING: ROMIO distribution did not configure successfully configure:227425: checking if MCA component io:romio can compile configure:227427: result: no -- b) dump of Open MPI's 'configure' output to the console: -- checking lustre/lustre_user.h usability... no checking lustre/lustre_user.h presence... yes configure: WARNING: lustre/lustre_user.h: present but cannot be compiled configure: WARNING: lustre/lustre_user.h: check for missing prerequisite headers? configure: WARNING: lustre/lustre_user.h: see the Autoconf documentation configure: WARNING: lustre/lustre_user.h: section "Present But Cannot Be Compiled" configure: WARNING: lustre/lustre_user.h: proceeding with the compiler