[OMPI users] valgrind invalid read
Hi, I'm using valgrind 3.12 with openmpi 2.0.1. The code simply send an integer to another process with : #include #include #include int main (int argc, char **argv) { const int tag = 13; int size, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); if (size < 2) { fprintf(stderr,"Requires at least two processes.\n"); exit(-1); } MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { int i=3; const int dest = 1; MPI_Send(&i, 1, MPI_INT, dest, tag, MPI_COMM_WORLD); printf("Rank %d: sent int\n", rank); } if (rank == 1) { int j; const int src=0; MPI_Status status; MPI_Recv(&j, 1, MPI_INT, src, tag, MPI_COMM_WORLD, &status); printf("Rank %d: Received: int = %d\n", rank,j); } MPI_Finalize(); return 0; } I'm getting the error : valgrind MPI wrappers 46313: Active for pid 46313 valgrind MPI wrappers 46313: Try MPIWRAP_DEBUG=help for possible options valgrind MPI wrappers 46314: Active for pid 46314 valgrind MPI wrappers 46314: Try MPIWRAP_DEBUG=help for possible options Rank 0: sent int ==46314== Invalid read of size 4 ==46314==at 0x400A3D: main (basic.c:33) ==46314== Address 0xffefff594 is on thread 1's stack ==46314== in frame #0, created by main (basic.c:5) ==46314== Rank 1: Received: int = 3 The invalid read is at the printf line. Do you have any clue of why am i getting it ? I ran the code with : LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2 $prefix/bin/valgrind ./exe Thanks in advance, Yann --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Using custom version of gfortran in mpifort
> On Nov 18, 2016, at 2:54 AM, Mahmood Naderan wrote: > > The mpifort wrapper uses the default gfortran compiler on the system. How can > I give it another version of gfortran which has been installed in another > folder? The best way is to specify the compiler(s) that you want Open MPI to use when you configure/build Open MPI itself: https://www.open-mpi.org/faq/?category=building#build-compilers That will propagate your compiler choice down into the wrapper compilers. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] openmpi-2.0.1
On Nov 17, 2016, at 3:43 PM, Gilles Gouaillardet wrote: > > if it still does not work, you can > cd ompi/tools > make V=1 > > and post the output Let me add to that: if that doesn't work, please send all the information listed here: https://www.open-mpi.org/community/help/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path
Hello everybody, I am observing failures in the xdsyevr (and xssyevr) ScaLapack self tests when running on one or two nodes with OpenMPI 2.0.1. With 1.10.4 no failures are observed. Also, with mvapich2 2.2 no failures are observed. The other testers appear to be working with all MPIs mentioned (have to triple check again). I somehow overlooked the failures below at first. The system is an Intel OmniPath system (newest Intel driver release 10.2), i.e. we are using the PSM2 mtl I believe. I built the OpenMPIs with gcc 6.2 and the following identical options: ./configure FFLAGS="-O1" CFLAGS="-O1" FCFLAGS="-O1" CXXFLAGS="-O1" --with-psm2 --with-tm --with-hwloc=internal --enable-static --enable-orterun-prefix-by-default The ScaLapack build is also with gcc 6.2, openblas 0.2.19 and using "-O1 -g" as FCFLAGS and CCFLAGS identical for all tests, only wrapper compiler changes. With OpenMPI 1.10.4 I see on a single node mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009 ./xdsyevr 136 tests completed and passed residual checks. 0 tests completed without checking. 0 tests skipped for lack of memory. 0 tests completed and failed. With OpenMPI 1.10.4 I see on two nodes mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010 ./xdsyevr 136 tests completed and passed residual checks. 0 tests completed without checking. 0 tests skipped for lack of memory. 0 tests completed and failed. With OpenMPI 2.0.1 I see on a single node mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009 ./xdsyevr 32 tests completed and passed residual checks. 0 tests completed without checking. 0 tests skipped for lack of memory. 104 tests completed and failed. With OpenMPI 2.0.1 I see on two nodes mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010 ./xdsyevr 32 tests completed and passed residual checks. 0 tests completed without checking. 0 tests skipped for lack of memory. 104 tests completed and failed. A typical failure looks like this in the output IL, IU, VL or VU altered by PDSYEVR 500 1 1 1 8 Y 0.26-1.00 0.19E-02 15. FAILED 500 1 2 1 8 Y 0.29-1.00 0.79E-03 3.9 PASSED EVR IL, IU, VL or VU altered by PDSYEVR 500 1 1 2 8 Y 0.52-1.00 0.82E-03 2.5 FAILED 500 1 2 2 8 Y 0.41-1.00 0.79E-03 2.3 PASSED EVR 500 2 2 2 8 Y 0.18-1.00 0.78E-03 3.0 PASSED EVR IL, IU, VL or VU altered by PDSYEVR 500 4 1 4 8 Y 0.09-1.00 0.95E-03 4.1 FAILED 500 4 4 1 8 Y 0.11-1.00 0.91E-03 2.8 PASSED EVR The variable OMP_NUM_THREADS=1 to stop the openblas from threading. We see similar problems with intel 2016 compilers, but I believe gcc is a good baseline. Any ideas ? For us this is a real problem in that we do not know if this indicates a network (transport) issue in the intel software stack (libpsm2, hfi1 kernel module) which might affect our production codes or if this is an OpenMPI issue. We have some other problems I might ask about later on this list, but nothing which yields such a nice reproducer and especially these other problems might well be application related. Best Regards Christof -- Dr. rer. nat. Christof Köhler email: c.koeh...@bccms.uni-bremen.de Universitaet Bremen/ BCCMS phone: +49-(0)421-218-62334 Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-62770 28359 Bremen PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/ ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path
Hi Christof, Thanks for trying out 2.0.1. Sorry that you're hitting problems. Could you try to run the tests using the 'ob1' PML in order to bypass PSM2? mpirun --mca pml ob1 (all the rest of the args) and see if you still observe the failures? Howard 2016-11-18 9:32 GMT-07:00 Christof Köhler < christof.koeh...@bccms.uni-bremen.de>: > Hello everybody, > > I am observing failures in the xdsyevr (and xssyevr) ScaLapack self tests > when running on one or two nodes with OpenMPI 2.0.1. With 1.10.4 no > failures are observed. Also, with mvapich2 2.2 no failures are observed. > The other testers appear to be working with all MPIs mentioned (have to > triple check again). I somehow overlooked the failures below at first. > > The system is an Intel OmniPath system (newest Intel driver release 10.2), > i.e. we are using the PSM2 > mtl I believe. > > I built the OpenMPIs with gcc 6.2 and the following identical options: > ./configure FFLAGS="-O1" CFLAGS="-O1" FCFLAGS="-O1" CXXFLAGS="-O1" > --with-psm2 --with-tm --with-hwloc=internal --enable-static > --enable-orterun-prefix-by-default > > The ScaLapack build is also with gcc 6.2, openblas 0.2.19 and using "-O1 > -g" as FCFLAGS and CCFLAGS identical for all tests, only wrapper compiler > changes. > > With OpenMPI 1.10.4 I see on a single node > > mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca > oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009 > ./xdsyevr > 136 tests completed and passed residual checks. > 0 tests completed without checking. > 0 tests skipped for lack of memory. > 0 tests completed and failed. > > With OpenMPI 1.10.4 I see on two nodes > > mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca > oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010 > ./xdsyevr > 136 tests completed and passed residual checks. > 0 tests completed without checking. > 0 tests skipped for lack of memory. > 0 tests completed and failed. > > With OpenMPI 2.0.1 I see on a single node > > mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca > oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009 > ./xdsyevr > 32 tests completed and passed residual checks. > 0 tests completed without checking. > 0 tests skipped for lack of memory. > 104 tests completed and failed. > > With OpenMPI 2.0.1 I see on two nodes > > mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca > oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010 > ./xdsyevr >32 tests completed and passed residual checks. > 0 tests completed without checking. > 0 tests skipped for lack of memory. > 104 tests completed and failed. > > A typical failure looks like this in the output > > IL, IU, VL or VU altered by PDSYEVR >500 1 1 1 8 Y 0.26-1.00 0.19E-02 15. FAILED >500 1 2 1 8 Y 0.29-1.00 0.79E-03 3.9 PASSED > EVR > IL, IU, VL or VU altered by PDSYEVR >500 1 1 2 8 Y 0.52-1.00 0.82E-03 2.5 FAILED >500 1 2 2 8 Y 0.41-1.00 0.79E-03 2.3 PASSED > EVR >500 2 2 2 8 Y 0.18-1.00 0.78E-03 3.0 PASSED > EVR > IL, IU, VL or VU altered by PDSYEVR >500 4 1 4 8 Y 0.09-1.00 0.95E-03 4.1 FAILED >500 4 4 1 8 Y 0.11-1.00 0.91E-03 2.8 PASSED > EVR > > > The variable OMP_NUM_THREADS=1 to stop the openblas from threading. > We see similar problems with intel 2016 compilers, but I believe gcc is a > good baseline. > > Any ideas ? For us this is a real problem in that we do not know if this > indicates a network (transport) issue in the intel software stack (libpsm2, > hfi1 kernel module) which might affect our production codes or if this is > an OpenMPI issue. We have some other problems I might ask about later on > this list, but nothing which yields such a nice reproducer and especially > these other problems might well be application related. > > Best Regards > > Christof > > -- > Dr. rer. nat. Christof Köhler email: c.koeh...@bccms.uni-bremen.de > Universitaet Bremen/ BCCMS phone: +49-(0)421-218-62334 > Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-62770 > 28359 Bremen > > PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/ > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users