Re: [OMPI users] [EXTERNAL] Re: Using shmem_int_fadd() in OpenMPI\'s SHMEM

2017-11-22 Thread Howard Pritchard
Hi Ben,

Actually I did some checking about the brew install for OFi libfabric.
It looks like if your brew is up to date, it will pick up libfabric 1.5.2.

Howard


2017-11-22 15:21 GMT-07:00 Howard Pritchard :

> HI Ben,
>
> Even on one box, the yoda component doesn't work any more.
>
> If you want to do OpenSHMEM programming on you Macbook pro (like I do)
> and you don't want to set up a VM to use UCX, then you can use
> Sandia OpenSHMEM implementation.
>
> https://github.com/Sandia-OpenSHMEM/SOS
>
> You will need to install the MPICH hydra launcher
>
> http://www.mpich.org/downloads/versions/
>
> as the SOS needs that for its oshrun launcher.
>
> I use hydra-3.2 on my mac with SOS.
>
> You will also need to install OFI libfabric:
>
> https://github.com/ofiwg/libfabric
>
> I'd suggest installing the OFI 1.5.1 tarball.  OFI is also available via
> brew
> but its so old that I doubt it will work with recent versions of SOS.
>
> If you'd like to use UCX, you'll need to install it and Open MPI on a VM
> running  a linux distro.
>
> Howard
>
>
> 2017-11-21 12:47 GMT-07:00 Benjamin Brock :
>
>> > What version of Open MPI are you trying to use?
>>
>> Open MPI 2.1.1-2 as distributed by Arch Linux.
>>
>> > Also, could you describe something about your system.
>>
>> This is all in shared memory on a MacBook Pro; no networking involved.
>>
>> The seg fault with the code example above looks like this:
>>
>> [xiii@shini kmer_hash]$ g++ minimal.cpp -o minimal `shmemcc
>> --showme:link`
>> [xiii@shini kmer_hash]$ !shm
>> shmemrun -n 2 ./minimal
>> [shini:08284] *** Process received signal ***
>> [shini:08284] Signal: Segmentation fault (11)
>> [shini:08284] Signal code: Address not mapped (1)
>> [shini:08284] Failing at address: 0x18
>> [shini:08284] [ 0] /usr/lib/libpthread.so.0(+0x11da0)[0x7f06fb763da0]
>> [shini:08284] [ 1] /usr/lib/openmpi/openmpi/mca_s
>> pml_yoda.so(mca_spml_yoda_get+0x7da)[0x7f06e0eef0aa]
>> [shini:08284] [ 2] /usr/lib/openmpi/openmpi/mca_a
>> tomic_basic.so(atomic_basic_lock+0xb2)[0x7f06e08d90d2]
>> [shini:08284] [ 3] /usr/lib/openmpi/openmpi/mca_a
>> tomic_basic.so(mca_atomic_basic_fadd+0x4a)[0x7f06e08d949a]
>> [shini:08284] [ 4] /usr/lib/openmpi/liboshmem.so.
>> 20(shmem_int_fadd+0x90)[0x7f06fc5a7660]
>> [shini:08284] [ 5] ./minimal(+0x94f)[0x55a5cde7e94f]
>> [shini:08284] [ 6] /usr/lib/libc.so.6(__libc_star
>> t_main+0xea)[0x7f06fb3baf6a]
>> [shini:08284] [ 7] ./minimal(+0x80a)[0x55a5cde7e80a]
>> [shini:08284] *** End of error message ***
>> 
>> --
>> shmemrun noticed that process rank 1 with PID 0 on node shini exited on
>> signal 11 (Segmentation fault).
>> 
>> --
>>
>> Cheers,
>>
>> Ben
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] [EXTERNAL] Re: Using shmem_int_fadd() in OpenMPI\'s SHMEM

2017-11-22 Thread Howard Pritchard
HI Ben,

Even on one box, the yoda component doesn't work any more.

If you want to do OpenSHMEM programming on you Macbook pro (like I do)
and you don't want to set up a VM to use UCX, then you can use
Sandia OpenSHMEM implementation.

https://github.com/Sandia-OpenSHMEM/SOS

You will need to install the MPICH hydra launcher

http://www.mpich.org/downloads/versions/

as the SOS needs that for its oshrun launcher.

I use hydra-3.2 on my mac with SOS.

You will also need to install OFI libfabric:

https://github.com/ofiwg/libfabric

I'd suggest installing the OFI 1.5.1 tarball.  OFI is also available via
brew
but its so old that I doubt it will work with recent versions of SOS.

If you'd like to use UCX, you'll need to install it and Open MPI on a VM
running  a linux distro.

Howard


2017-11-21 12:47 GMT-07:00 Benjamin Brock :

> > What version of Open MPI are you trying to use?
>
> Open MPI 2.1.1-2 as distributed by Arch Linux.
>
> > Also, could you describe something about your system.
>
> This is all in shared memory on a MacBook Pro; no networking involved.
>
> The seg fault with the code example above looks like this:
>
> [xiii@shini kmer_hash]$ g++ minimal.cpp -o minimal `shmemcc --showme:link`
> [xiii@shini kmer_hash]$ !shm
> shmemrun -n 2 ./minimal
> [shini:08284] *** Process received signal ***
> [shini:08284] Signal: Segmentation fault (11)
> [shini:08284] Signal code: Address not mapped (1)
> [shini:08284] Failing at address: 0x18
> [shini:08284] [ 0] /usr/lib/libpthread.so.0(+0x11da0)[0x7f06fb763da0]
> [shini:08284] [ 1] /usr/lib/openmpi/openmpi/mca_s
> pml_yoda.so(mca_spml_yoda_get+0x7da)[0x7f06e0eef0aa]
> [shini:08284] [ 2] /usr/lib/openmpi/openmpi/mca_a
> tomic_basic.so(atomic_basic_lock+0xb2)[0x7f06e08d90d2]
> [shini:08284] [ 3] /usr/lib/openmpi/openmpi/mca_a
> tomic_basic.so(mca_atomic_basic_fadd+0x4a)[0x7f06e08d949a]
> [shini:08284] [ 4] /usr/lib/openmpi/liboshmem.so.
> 20(shmem_int_fadd+0x90)[0x7f06fc5a7660]
> [shini:08284] [ 5] ./minimal(+0x94f)[0x55a5cde7e94f]
> [shini:08284] [ 6] /usr/lib/libc.so.6(__libc_star
> t_main+0xea)[0x7f06fb3baf6a]
> [shini:08284] [ 7] ./minimal(+0x80a)[0x55a5cde7e80a]
> [shini:08284] *** End of error message ***
> --
> shmemrun noticed that process rank 1 with PID 0 on node shini exited on
> signal 11 (Segmentation fault).
> --
>
> Cheers,
>
> Ben
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] [EXTERNAL] Re: Using shmem_int_fadd() in OpenMPI's SHMEM

2017-11-22 Thread Howard Pritchard
HI Folks,

For the Open MPI 2.1.1 release, the only OSHMEM SPML's that work are the
ikrit and ucx.
yoda doesn't work.

Ikrit only works on systems with Mellanox iinterconnects and requires MXM
to be installed.
This is recommended for systems with connectx3 or older HCAs.  For systems
with
connectx4 or connectx5 you should be using UCX.

You'll need to add --with-ucx + arguments as required to the configure
command line when
you build Open MPI/OSHMEM to pick up the ucx stuff.

A gotcha is that by default, the ucx spml is not selected, so either on the
oshrun
command line add

--mca spml ucx

or via env. variable

export OMPI_MCA_spml=ucx

I verified that a 2.1.1 release + UCX 1.2.0 builds your test (after fixing
the unusual
include files) and passes on my mellanox connectx5 cluster.

Howard


2017-11-21 8:24 GMT-07:00 Hammond, Simon David :

> Hi Howard/OpenMPI Users,
>
>
>
> I have had a similar seg-fault this week using OpenMPI 2.1.1 with GCC
> 4.9.3 so I tried to compile the example code in the email below. I see
> similar behavior to a small benchmark we have in house (but using inc not
> finc).
>
>
>
> When I run on a single node (both PE’s on the same node) I get the error
> below. But, if I run on multiple nodes (say 2 nodes with one PE per node)
> then the code runs fine. Same thing for my benchmark which uses
> shmem_longlong_inc. For reference, we are using InfiniBand on our cluster
> and dual-socket Haswell processors.
>
>
>
> Hope that helps,
>
>
>
> S.
>
>
>
> $ shmemrun -n 2 ./testfinc
>
> --
>
> WARNING: There is at least non-excluded one OpenFabrics device found,
>
> but there are no active ports detected (or Open MPI was unable to use
>
> them).  This is most certainly not what you wanted.  Check your
>
> cables, subnet manager configuration, etc.  The openib BTL will be
>
> ignored for this job.
>
>
>
>   Local host: shepard-lsm1
>
> --
>
> [shepard-lsm1:49505] *** Process received signal ***
>
> [shepard-lsm1:49505] Signal: Segmentation fault (11)
>
> [shepard-lsm1:49505] Signal code: Address not mapped (1)
>
> [shepard-lsm1:49505] Failing at address: 0x18
>
> [shepard-lsm1:49505] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7ffc4cd9e710]
>
> [shepard-lsm1:49505] [ 1] /home/projects/x86-64-haswell/
> openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_spml_yoda.so(mca_
> spml_yoda_get+0x86d)[0x7ffc337cf37d]
>
> [shepard-lsm1:49505] [ 2] /home/projects/x86-64-haswell/
> openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_atomic_basic.so(
> atomic_basic_lock+0x9a)[0x7ffc32f190aa]
>
> [shepard-lsm1:49505] [ 3] /home/projects/x86-64-haswell/
> openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_atomic_basic.so(
> mca_atomic_basic_fadd+0x39)[0x7ffc32f19409]
>
> [shepard-lsm1:49505] [ 4] /home/projects/x86-64-haswell/
> openmpi/2.1.1/gcc/4.9.3/lib/liboshmem.so.20(shmem_int_
> fadd+0x80)[0x7ffc4d2fc110]
>
> [shepard-lsm1:49505] [ 5] ./testfinc[0x400888]
>
> [shepard-lsm1:49505] [ 6] /lib64/libc.so.6(__libc_start_
> main+0xfd)[0x7ffc4ca19d5d]
>
> [shepard-lsm1:49505] [ 7] ./testfinc[0x400739]
>
> [shepard-lsm1:49505] *** End of error message ***
>
> --
>
> shmemrun noticed that process rank 1 with PID 0 on node shepard-lsm1
> exited on signal 11 (Segmentation fault).
>
> --
>
> [shepard-lsm1:49499] 1 more process has sent help message
> help-mpi-btl-openib.txt / no active ports found
>
> [shepard-lsm1:49499] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
>
>
>
> --
>
> Si Hammond
>
> Scalable Computer Architectures
>
> Sandia National Laboratories, NM, USA
>
>
>
>
>
> *From: *users  on behalf of Howard
> Pritchard 
> *Reply-To: *Open MPI Users 
> *Date: *Monday, November 20, 2017 at 4:11 PM
> *To: *Open MPI Users 
> *Subject: *[EXTERNAL] Re: [OMPI users] Using shmem_int_fadd() in
> OpenMPI's SHMEM
>
>
>
> HI Ben,
>
>
>
> What version of Open MPI are you trying to use?
>
>
>
> Also, could you describe something about your system.  If its a cluster
>
> what sort of interconnect is being used.
>
>
>
> Howard
>
>
>
>
>
> 2017-11-20 14:13 GMT-07:00 Benjamin Brock :
>
> What's the proper way to use shmem_int_fadd() in OpenMPI's SHMEM?
>
>
>
> A minimal example seems to seg fault:
>
>
>
> #include 
>
> #include 
>
>
>
> #include 
>
>
>
> int main(int argc, char **argv) {
>
>   shmem_init();
>
>   const size_t shared_segment_size = 1024;
>
>   void *shared_segment = shmem_malloc(shared_segment_size);
>
>
>
>   int *arr = (int *) shared_segment;
>
>   int *local_arr = (int *) malloc(sizeof(int) * 10);
>
>
>
>   if (shmem_my_pe() == 1) {
>
> shmem_int_fadd((int *) shared_segment, 1, 0);
>
>   }
>
>   shmem_barrier_all();
>
>
>
>   return 0;
>
> }
>
>
>
> Where am I going wrong here?  This sort of thing works in Cray SHMEM.