Re: [OMPI users] GPUDirect with OpenMPI

2015-03-03 Thread Rolf vandeVaart
Hi Rob:
Sorry for the slow reply but it took me a while to figure this out.  It turns 
out that this issue had to do with how some of the memory within the smcuda BTL 
was being registered with CUDA.  This was fixed a few weeks ago and will be 
available in the 1.8.5 release.  Perhaps you could retry with a pre-release 
version of Open MPI 1.8.5 that is available here and confirm it fixes your 
issue.  Any of the ones listed on that page should be fine.

http://www.open-mpi.org/nightly/v1.8/

Thanks,
Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Wednesday, February 11, 2015 3:50 PM
To: Open MPI Users
Subject: Re: [OMPI users] GPUDirect with OpenMPI

Let me try to reproduce this.  This should not have anything to do with GPU 
Direct RDMA.  However, to eliminate it, you could run with:
--mca btl_openib_want_cuda_gdr 0.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aulwes, Rob
Sent: Wednesday, February 11, 2015 2:17 PM
To: us...@open-mpi.org
Subject: [OMPI users] GPUDirect with OpenMPI

Hi,

I built OpenMPI 1.8.3 using PGI 14.7 and enabled CUDA support for CUDA 6.0.  I 
have a Fortran test code that tests GPUDirect and have included it here.  When 
I run it across 2 nodes using 4 MPI procs, sometimes it fails with incorrect 
results.  Specifically, sometimes rank 1 does not receive the correct value 
from one of the neighbors.

The code was compiled using PGI 14.7:
mpif90 -o direct.x -acc acc_direct.f90

and executed with:
mpirun -np 4 -npernode 2 -mca btl_openib_want_cudagdr 1 ./direct.x

Does anyone know if I'm missing something when using GPUDirect?

Thanks,Rob Aulwes


program acc_direct



 include 'mpif.h'





 integer :: ierr, rank, nranks

integer, dimension(:), allocatable :: i_ra



   call mpi_init(ierr)



   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

   rank = rank + 1

   write(*,*) 'hello from rank ',rank



   call MPI_COMM_SIZE(MPI_COMM_WORLD, nranks, ierr)



   allocate( i_ra(nranks) )



   call nb_exchange



   call mpi_finalize(ierr)





 contains



 subroutine nb_exchange



   integer :: i, j

   integer, dimension(nranks - 1) :: sendreq, recvreq

   logical :: done

   integer :: stat(MPI_STATUS_SIZE)



   i_ra = -1

   i_ra(rank) = rank



   !$acc data copy(i_ra(1:nranks))



   !$acc host_data use_device(i_ra)



   cnt = 0

   do i = 1,nranks

  if ( i .ne. rank ) then

 cnt = cnt + 1



 call MPI_ISEND(i_ra(rank), 1, MPI_INTEGER, i - 1, rank, &

MPI_COMM_WORLD, sendreq(cnt), ierr)

 if ( ierr .ne. MPI_SUCCESS ) write(*,*) 'isend call failed.'



 call MPI_IRECV(i_ra(i), 1, MPI_INTEGER, i - 1, i, &

MPI_COMM_WORLD, recvreq(cnt), ierr)

 if ( ierr .ne. MPI_SUCCESS ) write(*,*) 'irecv call failed.'



  endif



   enddo



   !$acc end host_data



   i = 0

   do while ( i .lt. 2*cnt )

 do j = 1, cnt

if ( recvreq(j) .ne. MPI_REQUEST_NULL ) then

call MPI_TEST(recvreq(j), done, stat, ierr)

if ( ierr .ne. MPI_SUCCESS ) &

   write(*,*) 'test for irecv call failed.'

if ( done ) then

   i = i + 1

endif

endif



if ( sendreq(j) .ne. MPI_REQUEST_NULL ) then

call MPI_TEST(sendreq(j), done, MPI_STATUS_IGNORE, ierr)

if ( ierr .ne. MPI_SUCCESS ) &

   write(*,*) 'test for irecv call failed.'

if ( done ) then

   i = i + 1

endif

endif

 enddo

   enddo



   write(*,*) rank,': nb_exchange: Updating host...'

   !$acc update host(i_ra(1:nranks))





   do j = 1, nranks

 if ( i_ra(j) .ne. j ) then

   write(*,*) 'isend/irecv failed.'

   write(*,*) 'rank', rank,': i_ra(',j,') = ',i_ra(j)

 endif

   enddo



   !$acc end data



 end subroutine





end program


This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-03-03 Thread Lev Givon
Received from Ralph Castain on Sun, Mar 01, 2015 at 10:31:15AM EST:
> > On Feb 26, 2015, at 1:19 PM, Lev Givon  wrote:
> > 
> > Received from Ralph Castain on Thu, Feb 26, 2015 at 04:14:05PM EST:
> >>> On Feb 26, 2015, at 1:07 PM, Lev Givon  wrote:
> >>> 
> >>> I recently tried to build OpenMPI 1.8.4 on a daily release of what will
> >>> eventually become Ubuntu 15.04 (64-bit) with the --with-slurm and 
> >>> --with-pmi
> >>> options on.  I noticed that the libpmi.so.0.0.0 library in Ubuntu 15.04 
> >>> is now
> >>> in the multiarch location /usr/lib/x86_64-linux-gnu rather than /usr/lib; 
> >>> this
> >>> causes the configure script to complain that it can't find libpmi/libpmi2 
> >>> in
> >>> /usr/lib or /usr/lib64. Setting LDFLAGS=-L/usr/lib/x86_64-linux-gnu and/or
> >>> LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu doesn't seem to help. How can I 
> >>> get
> >>> configure find the pmi library when it is in a multiarch location?
> >> 
> >> Looks like we don’t have a separate pmi-libdir configure option, so it may 
> >> not
> >> work. I can add one to the master and set to pull it across to 1.8.5.
> > 
> > That would be great. Another possibility is to add 
> > /usr/lib/x86_64-linux-gnu and
> > /usr/lib/i386-linux-gnu to the default libdirs searched when testing for 
> > pmi.
>
> 
> Could you please check the nightly 1.8 tarball? I added the pmi-libdir
> option. Having it default to look for x86 etc subdirs is a little too
> system-specific - if that ever becomes a broader standard way of installing
> things, then I'd be more inclined to add it to the default search algo.
> 
> http://www.open-mpi.org/nightly/v1.8/

The libpmi library file in Ubuntu 15.04 is in /usr/lib/x86_64-linux-gnu, not
/usr/lib/x86_64-linux-gnu/lib or /usr/lib/x86_64-linux-gnu/lib64. Could the
pmi-libdir option be modified to use the specified directory as-is rather than
appending lib or lib64 to it?
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/



Re: [OMPI users] LAM/MPI -> OpenMPI

2015-03-03 Thread Sasso, John (GE Power & Water, Non-GE)
As far as I know, no MPI-IO is done in their LAM/MPI-based apps

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rob Latham
Sent: Friday, February 27, 2015 11:22 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] LAM/MPI -> OpenMPI



On 02/27/2015 09:40 AM, Ralph Castain wrote:

>> Yeah, any other recommendations I can give to convince the 
>> powers-that-be that immediate sun-setting of LAM/MPI would be great.
>>  Sometimes I feel like I am trying to fit a square peg in a round 
>> holeL
>
> Other than the fact that LAM/MPI no longer is supported, the only real 
> rationale would be that OMPI has a lot of enhancements in terms of 
> binding options and other features, supports thru MPI-3, etc.

Does this application do any I/O?  I was curious so I dug around in LAM's 
suversion repository.  Last change to ROMIO, the MPI-IO implementation, was 
this one:

r10377 | brbarret | 2007-07-02 21:53:06

so you're missing out on 8 years of I/O related bug fixes and optimizations.


==rob

--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA 
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/02/26410.php


[OMPI users] some warnings for openmpi-dev-1184-gbb22d26

2015-03-03 Thread Siegmar Gross
Hi,

today I tried to build openmpi-dev-1184-gbb22d2 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
x86_64) with gcc-4.9.2 and Sun C 5.13. Perhaps somebody is
interested in some warnings that I got.


tyr openmpi-dev-1184-gbb22d26-SunOS.sparc.64_gcc 57 grep warning \
  log.make.SunOS.sparc.64_gcc | grep -v attempted | sort | uniq
../../../openmpi-dev-1184-gbb22d26/ompi/datatype/ompi_datatype_args.c:68:11:
  warning: assignment makes integer from pointer without a cast
ld: warning: symbol 'mpi_fortran_argv_null' has differing sizes:
ld: warning: symbol 'mpi_fortran_argv_null_' has differing sizes:
ld: warning: symbol 'mpi_fortran_argvs_null' has differing sizes:
ld: warning: symbol 'mpi_fortran_argvs_null_' has differing sizes:
ld: warning: symbol 'mpi_fortran_errcodes_ignore' has differing sizes:
ld: warning: symbol 'mpi_fortran_errcodes_ignore_' has differing sizes:
ld: warning: symbol 'mpi_fortran_status_ignore' has differing sizes:
ld: warning: symbol 'mpi_fortran_status_ignore_' has differing sizes:
ld: warning: symbol 'mpi_fortran_statuses_ignore' has differing sizes:
ld: warning: symbol 'mpi_fortran_statuses_ignore_' has differing sizes:



linpc1 openmpi-dev-1184-gbb22d26-Linux.x86_64.64_gcc 165 grep warning \
  log.make.Linux.x86_64.64_gcc | grep -v attempted | sort | uniq
/usr/include/netlink/object.h:58:23: warning: inline function
  'nl_object_priv' declared but never defined



tyr openmpi-dev-1184-gbb22d26-SunOS.sparc.64_cc 56 grep warning \
  log.make.SunOS.sparc.64_cc | grep -v attempted | sort | uniq
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-custom.c",
  line 88: warning: initializer will be sign-extended: -1
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-synthetic.c",
  line 432: warning: initializer will be sign-extended: -1
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-xml.c",
  line 1518: warning: initializer will be sign-extended: -1
"../../../../../../openmpi-dev-1184-gbb22d26/ompi/mca/io/romio/romio/adio/common/ad_fstype.c",
  line 310: warning: statement not reached
"../../../openmpi-dev-1184-gbb22d26/ompi/datatype/ompi_datatype_args.c",
  line 512: warning: improper pointer/integer combination: op "="



linpc1 openmpi-dev-1184-gbb22d26-Linux.x86_64.64_cc 153 grep warning \
  log.make.Linux.x86_64.64_cc | grep -v attempted | \
  grep -v atomic.h | sort | uniq
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-custom.c",
  line 88: warning: initializer will be sign-extended: -1
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-linux.c",
  line 2528: warning: initializer will be sign-extended: -1
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-synthetic.c",
  line 432: warning: initializer will be sign-extended: -1
"../../../../../../../openmpi-dev-1184-gbb22d26/opal/mca/hwloc/hwloc191/hwloc/src/topology-xml.c",
  line 1518: warning: initializer will be sign-extended: -1
"../../../../../openmpi-dev-1184-gbb22d26/opal/mca/reachable/netlink/reachable_netlink_utils_common.c",
  line 322: warning: extern inline function "nl_object_priv" not defined in 
translation unit


Kind regards

Siegmar