HI Eric, Does your app also work with MPICH? The romio in Open MPI is getting a bit old, so it would be useful to know if you see the same valgrind error using a recent MPICH.
Howard 2014-12-19 9:50 GMT-07:00 Eric Chamberland <eric.chamberl...@giref.ulaval.ca >: > > Hi, > > I encountered a new bug while testing our collective MPI I/O > functionnalities over NFS. This is not a big issue for us, but I think > someone should have a look at it. > > While running at 3 processes, we have this error on rank #0 and rank #2, > knowing that rank #1 have nothing to write (0 length size) on this > particular PMPI_File_write_all_begin call: > > > ==19211== Syscall param write(buf) points to uninitialised byte(s) > ==19211== at 0x10CB739D: ??? (in /lib64/libpthread-2.17.so) > ==19211== by 0x27438431: ADIOI_NFS_WriteStrided (ad_nfs_write.c:645) > ==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl > (ad_write_coll.c:159) > ==19211== by 0x274321BD: MPIOI_File_write_all_begin (write_allb.c:114) > ==19211== by 0x27431DBF: mca_io_romio_dist_MPI_File_write_all_begin > (write_allb.c:44) > ==19211== by 0x2742A367: mca_io_romio_file_write_all_begin > (io_romio_file_write.c:264) > ==19211== by 0x12126520: PMPI_File_write_all_begin > (pfile_write_all_begin.c:74) > ==19211== by 0x4D7CFB: SYEnveloppeMessage<std::string> PAIO:: > ecritureIndexeParBlocMPI<PAIOType<double>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> > >(PAGroupeProcessus&, ompi_file_t*, long long, > PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>&, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> >&, long, DistributionComposantes&, long, unsigned long, unsigned > long, std::string const&) (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4E9A67: GISLectureEcriture<double>::visiteMaillage(Maillage > const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4C79A2: GISLectureEcriture<double>::ecritGISMPI(std::string, > GroupeInfoSur<double> const&, std::string const&) (in > /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4961AD: main (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== Address 0x295af060 is 144 bytes inside a block of size 524,288 > alloc'd > ==19211== at 0x4C2C27B: malloc (in /usr/lib64/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==19211== by 0x2745E78E: ADIOI_Malloc_fn (malloc.c:50) > ==19211== by 0x2743757C: ADIOI_NFS_WriteStrided (ad_nfs_write.c:497) > ==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl > (ad_write_coll.c:159) > ==19211== by 0x274321BD: MPIOI_File_write_all_begin (write_allb.c:114) > ==19211== by 0x27431DBF: mca_io_romio_dist_MPI_File_write_all_begin > (write_allb.c:44) > ==19211== by 0x2742A367: mca_io_romio_file_write_all_begin > (io_romio_file_write.c:264) > ==19211== by 0x12126520: PMPI_File_write_all_begin > (pfile_write_all_begin.c:74) > ==19211== by 0x4D7CFB: SYEnveloppeMessage<std::string> PAIO:: > ecritureIndexeParBlocMPI<PAIOType<double>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> > >(PAGroupeProcessus&, ompi_file_t*, long long, > PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>&, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> >&, long, DistributionComposantes&, long, unsigned long, unsigned > long, std::string const&) (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4E9A67: GISLectureEcriture<double>::visiteMaillage(Maillage > const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4C79A2: GISLectureEcriture<double>::ecritGISMPI(std::string, > GroupeInfoSur<double> const&, std::string const&) (in > /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4961AD: main (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== Uninitialised value was created by a heap allocation > ==19211== at 0x4C2C27B: malloc (in /usr/lib64/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==19211== by 0x2745E78E: ADIOI_Malloc_fn (malloc.c:50) > ==19211== by 0x2743757C: ADIOI_NFS_WriteStrided (ad_nfs_write.c:497) > ==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl > (ad_write_coll.c:159) > ==19211== by 0x274321BD: MPIOI_File_write_all_begin (write_allb.c:114) > ==19211== by 0x27431DBF: mca_io_romio_dist_MPI_File_write_all_begin > (write_allb.c:44) > ==19211== by 0x2742A367: mca_io_romio_file_write_all_begin > (io_romio_file_write.c:264) > ==19211== by 0x12126520: PMPI_File_write_all_begin > (pfile_write_all_begin.c:74) > ==19211== by 0x4D7CFB: SYEnveloppeMessage<std::string> PAIO:: > ecritureIndexeParBlocMPI<PAIOType<double>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> > >(PAGroupeProcessus&, ompi_file_t*, long long, > PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>, > FunctorCopieInfosSurDansVectPAType<PAIOType<double>, > std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double, > Arete>*> > const>&, FunctorAccesseurPorteurLocal<PtrPorteurConst<Arete, > Arete> >&, long, DistributionComposantes&, long, unsigned long, unsigned > long, std::string const&) (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4E9A67: GISLectureEcriture<double>::visiteMaillage(Maillage > const&) (in /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4C79A2: GISLectureEcriture<double>::ecritGISMPI(std::string, > GroupeInfoSur<double> const&, std::string const&) (in > /home/mefpp_ericc/GIREF/bin/Test.LectureEcritureGISMPI.opt) > ==19211== by 0x4961AD: main (in /home/mefpp_ericc/GIREF/bin/ > Test.LectureEcritureGISMPI.opt) > ==19211== > > Can't tell if it is a big issue or not, but I thought I should mention it > to the list.... > > We run without this valgrind error when I use my local disk partition > instead of an nfs parition or if I run with only 1 process (which always > have something to write for each PMPI_File_write_all_begin) and write to an > nfs partition. > > Using openmpi-1.8.4rc3 compiled in "debug" mode: > > ompi_info -all : > http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz > config.log: > http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz > > Thanks, > > Eric > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/12/ > 16691.php >