[OMPI users] valgrind invalid read

2016-11-18 Thread Yann Jobic

Hi,

I'm using valgrind 3.12 with openmpi 2.0.1.
The code simply send an integer to another process with :
#include 
#include 
#include 

int main (int argc, char **argv) {
  const int tag = 13;
  int size, rank;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  if (size < 2) {
  fprintf(stderr,"Requires at least two processes.\n");
  exit(-1);
  }

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0) {
int i=3;
const int dest = 1;

MPI_Send(&i,   1, MPI_INT, dest, tag, MPI_COMM_WORLD);

printf("Rank %d: sent int\n", rank);
  }
  if (rank == 1) {
int j;
const int src=0;
MPI_Status status;

MPI_Recv(&j,   1, MPI_INT, src, tag, MPI_COMM_WORLD, &status);
printf("Rank %d: Received: int = %d\n", rank,j);
  }

  MPI_Finalize();

  return 0;
}


I'm getting the error :
valgrind MPI wrappers 46313: Active for pid 46313
valgrind MPI wrappers 46313: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers 46314: Active for pid 46314
valgrind MPI wrappers 46314: Try MPIWRAP_DEBUG=help for possible options
Rank 0: sent int
==46314== Invalid read of size 4
==46314==at 0x400A3D: main (basic.c:33)
==46314==  Address 0xffefff594 is on thread 1's stack
==46314==  in frame #0, created by main (basic.c:5)
==46314==
Rank 1: Received: int = 3

The invalid read is at the printf line.

Do you have any clue of why am i getting it ?

I ran the code with :
LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-amd64-linux.so mpirun -np 2  
$prefix/bin/valgrind ./exe


Thanks in advance,

Yann

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using custom version of gfortran in mpifort

2016-11-18 Thread Jeff Squyres (jsquyres)
> On Nov 18, 2016, at 2:54 AM, Mahmood Naderan  wrote:
> 
> The mpifort wrapper uses the default gfortran compiler on the system. How can 
> I give it another version of gfortran which has been installed in another 
> folder?

The best way is to specify the compiler(s) that you want Open MPI to use when 
you configure/build Open MPI itself:

https://www.open-mpi.org/faq/?category=building#build-compilers

That will propagate your compiler choice down into the wrapper compilers.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] openmpi-2.0.1

2016-11-18 Thread Jeff Squyres (jsquyres)
On Nov 17, 2016, at 3:43 PM, Gilles Gouaillardet 
 wrote:
> 
> if it still does not work, you can
> cd ompi/tools
> make V=1
> 
> and post the output

Let me add to that: if that doesn't work, please send all the information 
listed here:

https://www.open-mpi.org/community/help/

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path

2016-11-18 Thread Christof Köhler

Hello everybody,

I am observing failures in the xdsyevr (and xssyevr) ScaLapack self  
tests when running on one or two nodes with OpenMPI 2.0.1. With 1.10.4  
no failures are observed. Also, with mvapich2 2.2 no failures are  
observed.
The other testers appear to be working with all MPIs mentioned (have  
to triple check again). I somehow overlooked the failures below at  
first.


The system is an Intel OmniPath system (newest Intel driver release  
10.2), i.e. we are using the PSM2

mtl I believe.

I built the OpenMPIs with gcc 6.2 and the following identical options:
./configure  FFLAGS="-O1" CFLAGS="-O1" FCFLAGS="-O1" CXXFLAGS="-O1"  
--with-psm2 --with-tm --with-hwloc=internal --enable-static  
--enable-orterun-prefix-by-default


The ScaLapack build is also with gcc 6.2, openblas 0.2.19 and using  
"-O1 -g" as FCFLAGS and CCFLAGS identical for all tests, only wrapper  
compiler changes.


With OpenMPI 1.10.4 I see on a single node

 mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca  
oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009  
./xdsyevr

136 tests completed and passed residual checks.
0 tests completed without checking.
0 tests skipped for lack of memory.
0 tests completed and failed.

With OpenMPI 1.10.4 I see on two nodes

mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca  
oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010  
./xdsyevr

  136 tests completed and passed residual checks.
0 tests completed without checking.
0 tests skipped for lack of memory.
0 tests completed and failed.

With OpenMPI 2.0.1 I see on a single node

mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca  
oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009  
./xdsyevr

32 tests completed and passed residual checks.
0 tests completed without checking.
0 tests skipped for lack of memory.
  104 tests completed and failed.

With OpenMPI 2.0.1 I see on two nodes

mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca  
oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010  
./xdsyevr

   32 tests completed and passed residual checks.
0 tests completed without checking.
0 tests skipped for lack of memory.
  104 tests completed and failed.

A typical failure looks like this in the output

IL, IU, VL or VU altered by PDSYEVR
   500   1   1   1   8   Y 0.26-1.00  0.19E-02   15. FAILED
   500   1   2   1   8   Y 0.29-1.00  0.79E-03   3.9 PASSED   EVR
IL, IU, VL or VU altered by PDSYEVR
   500   1   1   2   8   Y 0.52-1.00  0.82E-03   2.5 FAILED
   500   1   2   2   8   Y 0.41-1.00  0.79E-03   2.3 PASSED   EVR
   500   2   2   2   8   Y 0.18-1.00  0.78E-03   3.0 PASSED   EVR
IL, IU, VL or VU altered by PDSYEVR
   500   4   1   4   8   Y 0.09-1.00  0.95E-03   4.1 FAILED
   500   4   4   1   8   Y 0.11-1.00  0.91E-03   2.8 PASSED   EVR


The variable OMP_NUM_THREADS=1 to stop the openblas from threading.
We see similar problems with intel 2016 compilers, but I believe gcc  
is a good baseline.


Any ideas ? For us this is a real problem in that we do not know if  
this indicates a network (transport) issue in the intel software stack  
(libpsm2, hfi1 kernel module) which might affect our production codes  
or if this is an OpenMPI issue. We have some other problems I might  
ask about later on this list, but nothing which yields such a nice  
reproducer and especially these other problems might well be  
application related.


Best Regards

Christof

--
Dr. rer. nat. Christof Köhler   email: c.koeh...@bccms.uni-bremen.de
Universitaet Bremen/ BCCMS  phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.12   fax: +49-(0)421-218-62770
28359 Bremen

PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path

2016-11-18 Thread Howard Pritchard
Hi Christof,

Thanks for trying out 2.0.1.  Sorry that you're hitting problems.
Could you try to run the tests using the 'ob1' PML in order to
bypass PSM2?

mpirun --mca pml ob1 (all the rest of the args)

and see if you still observe the failures?

Howard


2016-11-18 9:32 GMT-07:00 Christof Köhler <
christof.koeh...@bccms.uni-bremen.de>:

> Hello everybody,
>
> I am observing failures in the xdsyevr (and xssyevr) ScaLapack self tests
> when running on one or two nodes with OpenMPI 2.0.1. With 1.10.4 no
> failures are observed. Also, with mvapich2 2.2 no failures are observed.
> The other testers appear to be working with all MPIs mentioned (have to
> triple check again). I somehow overlooked the failures below at first.
>
> The system is an Intel OmniPath system (newest Intel driver release 10.2),
> i.e. we are using the PSM2
> mtl I believe.
>
> I built the OpenMPIs with gcc 6.2 and the following identical options:
> ./configure  FFLAGS="-O1" CFLAGS="-O1" FCFLAGS="-O1" CXXFLAGS="-O1"
> --with-psm2 --with-tm --with-hwloc=internal --enable-static
> --enable-orterun-prefix-by-default
>
> The ScaLapack build is also with gcc 6.2, openblas 0.2.19 and using "-O1
> -g" as FCFLAGS and CCFLAGS identical for all tests, only wrapper compiler
> changes.
>
> With OpenMPI 1.10.4 I see on a single node
>
>  mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca
> oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009
> ./xdsyevr
> 136 tests completed and passed residual checks.
> 0 tests completed without checking.
> 0 tests skipped for lack of memory.
> 0 tests completed and failed.
>
> With OpenMPI 1.10.4 I see on two nodes
>
> mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca
> oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010
> ./xdsyevr
>   136 tests completed and passed residual checks.
> 0 tests completed without checking.
> 0 tests skipped for lack of memory.
> 0 tests completed and failed.
>
> With OpenMPI 2.0.1 I see on a single node
>
> mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca
> oob_tcp_if_include eth0,team0 -host node009,node009,node009,node009
> ./xdsyevr
> 32 tests completed and passed residual checks.
> 0 tests completed without checking.
> 0 tests skipped for lack of memory.
>   104 tests completed and failed.
>
> With OpenMPI 2.0.1 I see on two nodes
>
> mpirun -n 4 -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -mca
> oob_tcp_if_include eth0,team0 -host node009,node010,node009,node010
> ./xdsyevr
>32 tests completed and passed residual checks.
> 0 tests completed without checking.
> 0 tests skipped for lack of memory.
>   104 tests completed and failed.
>
> A typical failure looks like this in the output
>
> IL, IU, VL or VU altered by PDSYEVR
>500   1   1   1   8   Y 0.26-1.00  0.19E-02   15. FAILED
>500   1   2   1   8   Y 0.29-1.00  0.79E-03   3.9 PASSED
>  EVR
> IL, IU, VL or VU altered by PDSYEVR
>500   1   1   2   8   Y 0.52-1.00  0.82E-03   2.5 FAILED
>500   1   2   2   8   Y 0.41-1.00  0.79E-03   2.3 PASSED
>  EVR
>500   2   2   2   8   Y 0.18-1.00  0.78E-03   3.0 PASSED
>  EVR
> IL, IU, VL or VU altered by PDSYEVR
>500   4   1   4   8   Y 0.09-1.00  0.95E-03   4.1 FAILED
>500   4   4   1   8   Y 0.11-1.00  0.91E-03   2.8 PASSED
>  EVR
>
>
> The variable OMP_NUM_THREADS=1 to stop the openblas from threading.
> We see similar problems with intel 2016 compilers, but I believe gcc is a
> good baseline.
>
> Any ideas ? For us this is a real problem in that we do not know if this
> indicates a network (transport) issue in the intel software stack (libpsm2,
> hfi1 kernel module) which might affect our production codes or if this is
> an OpenMPI issue. We have some other problems I might ask about later on
> this list, but nothing which yields such a nice reproducer and especially
> these other problems might well be application related.
>
> Best Regards
>
> Christof
>
> --
> Dr. rer. nat. Christof Köhler   email: c.koeh...@bccms.uni-bremen.de
> Universitaet Bremen/ BCCMS  phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.12   fax: +49-(0)421-218-62770
> 28359 Bremen
>
> PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users