Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-13 Thread Bert Wesarg via users
Dear Takahiro,
On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro
 wrote:
>
> XPMEM moved to GitLab.
>
> https://gitlab.com/hjelmn/xpmem

the first words from the README aren't very pleasant to read:

This is an experimental version of XPMEM based on a version provided by
Cray and uploaded to https://code.google.com/p/xpmem. This version supports
any kernel 3.12 and newer. *Keep in mind there may be bugs and this version
may cause kernel panics, code crashes, eat your cat, etc.*

Installing this on my laptop where I just want developing with SHMEM
it would be a pitty to lose work just because of that.

Best,
Bert

>
> Thanks,
> Takahiro Kawashima,
> Fujitsu
>
> > Hello Bert,
> >
> > What OS are you running on your notebook?
> >
> > If you are running Linux, and you have root access to your system,  then
> > you should be able to resolve the Open SHMEM support issue by installing
> > the XPMEM device driver on your system, and rebuilding UCX so it picks
> > up XPMEM support.
> >
> > The source code is on GitHub:
> >
> > https://github.com/hjelmn/xpmem
> >
> > Some instructions on how to build the xpmem device driver are at
> >
> > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM
> >
> > You will need to install the kernel source and symbols rpms on your
> > system before building the xpmem device driver.
> >
> > Hope this helps,
> >
> > Howard
> >
> >
> > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users <
> > users@lists.open-mpi.org>:
> >
> > > Hi,
> > >
> > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
> > >  wrote:
> > > >
> > > > The Open MPI Team, representing a consortium of research, academic, and
> > > > industry partners, is pleased to announce the release of Open MPI 
> > > > version
> > > > 4.0.0.
> > > >
> > > > v4.0.0 is the start of a new release series for Open MPI.  Starting with
> > > > this release, the OpenIB BTL supports only iWarp and RoCE by default.
> > > > Starting with this release,  UCX is the preferred transport protocol
> > > > for Infiniband interconnects. The embedded PMIx runtime has been updated
> > > > to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
> > > > release is ABI compatible with the 3.x release streams. There have been
> > > numerous
> > > > other bug fixes and performance improvements.
> > > >
> > > > Note that starting with Open MPI v4.0.0, prototypes for several
> > > > MPI-1 symbols that were deleted in the MPI-3.0 specification
> > > > (which was published in 2012) are no longer available by default in
> > > > mpi.h. See the README for further details.
> > > >
> > > > Version 4.0.0 can be downloaded from the main Open MPI web site:
> > > >
> > > >   https://www.open-mpi.org/software/ompi/v4.0/
> > > >
> > > >
> > > > 4.0.0 -- September, 2018
> > > > 
> > > >
> > > > - OSHMEM updated to the OpenSHMEM 1.4 API.
> > > > - Do not build OpenSHMEM layer when there are no SPMLs available.
> > > >   Currently, this means the OpenSHMEM layer will only build if
> > > >   a MXM or UCX library is found.
> > >
> > > so what is the most convenience way to get SHMEM working on a single
> > > shared memory node (aka. notebook)? I just realized that I don't have
> > > a SHMEM since Open MPI 3.0. But building with UCX does not help
> > > either. I tried with UCX 1.4 but Open MPI SHMEM
> > > still does not work:
> > >
> > > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
> > > $ oshrun -np 2 ./shmem_hello_world-4.0.0
> > > [1542109710.217344] [tudtug:27715:0] select.c:406  UCX  ERROR
> > > no remote registered memory access transport to tudtug:27716:
> > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > > mm/posix - Destination is unreachable, cma/cma - no put short
> > > [1542109710.217344] [tudtug:27716:0] select.c:406  UCX  ERROR
> > > no remote registered memory access transport to tudtug:27715:
> > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > > mm/posix - Destination is unreachable, cma/cma - no put short
> > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > > Error: add procs FAILED rc=-2
> > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > > Error: add procs FAILED rc=-2
> > > --
> > > It looks like SHMEM_INIT failed for some reason; your parallel process is
> > > likely to abort.  There are many reasons that a parallel process can
> > > fail during 

Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-13 Thread Bert Wesarg via users
Howard,
On Wed, Nov 14, 2018 at 5:26 AM Howard Pritchard  wrote:
>
> Hello Bert,
>
> What OS are you running on your notebook?

Ubuntu 18.04

>
> If you are running Linux, and you have root access to your system,  then
> you should be able to resolve the Open SHMEM support issue by installing
> the XPMEM device driver on your system, and rebuilding UCX so it picks
> up XPMEM support.
>
> The source code is on GitHub:
>
> https://github.com/hjelmn/xpmem
>
> Some instructions on how to build the xpmem device driver are at
>
> https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM
>
> You will need to install the kernel source and symbols rpms on your
> system before building the xpmem device driver.

I will try that. I already tried KNEM, which also did not worked.
Though thats definitely leaving the country of convenience. For a
development machine where performance doesn't matter, its a huge step
back for Open MPI I think.

I wil report back if that works.

Thanks.

Best,
Bert

>
> Hope this helps,
>
> Howard
>
>
> Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users 
> :
>>
>> Hi,
>>
>> On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
>>  wrote:
>> >
>> > The Open MPI Team, representing a consortium of research, academic, and
>> > industry partners, is pleased to announce the release of Open MPI version
>> > 4.0.0.
>> >
>> > v4.0.0 is the start of a new release series for Open MPI.  Starting with
>> > this release, the OpenIB BTL supports only iWarp and RoCE by default.
>> > Starting with this release,  UCX is the preferred transport protocol
>> > for Infiniband interconnects. The embedded PMIx runtime has been updated
>> > to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
>> > release is ABI compatible with the 3.x release streams. There have been 
>> > numerous
>> > other bug fixes and performance improvements.
>> >
>> > Note that starting with Open MPI v4.0.0, prototypes for several
>> > MPI-1 symbols that were deleted in the MPI-3.0 specification
>> > (which was published in 2012) are no longer available by default in
>> > mpi.h. See the README for further details.
>> >
>> > Version 4.0.0 can be downloaded from the main Open MPI web site:
>> >
>> >   https://www.open-mpi.org/software/ompi/v4.0/
>> >
>> >
>> > 4.0.0 -- September, 2018
>> > 
>> >
>> > - OSHMEM updated to the OpenSHMEM 1.4 API.
>> > - Do not build OpenSHMEM layer when there are no SPMLs available.
>> >   Currently, this means the OpenSHMEM layer will only build if
>> >   a MXM or UCX library is found.
>>
>> so what is the most convenience way to get SHMEM working on a single
>> shared memory node (aka. notebook)? I just realized that I don't have
>> a SHMEM since Open MPI 3.0. But building with UCX does not help
>> either. I tried with UCX 1.4 but Open MPI SHMEM
>> still does not work:
>>
>> $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
>> $ oshrun -np 2 ./shmem_hello_world-4.0.0
>> [1542109710.217344] [tudtug:27715:0] select.c:406  UCX  ERROR
>> no remote registered memory access transport to tudtug:27716:
>> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
>> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
>> mm/posix - Destination is unreachable, cma/cma - no put short
>> [1542109710.217344] [tudtug:27716:0] select.c:406  UCX  ERROR
>> no remote registered memory access transport to tudtug:27715:
>> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
>> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
>> mm/posix - Destination is unreachable, cma/cma - no put short
>> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
>> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
>> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
>> Error: add procs FAILED rc=-2
>> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
>> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
>> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
>> Error: add procs FAILED rc=-2
>> --
>> It looks like SHMEM_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during SHMEM_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open SHMEM
>> developer):
>>
>>   SPML add procs failed
>>   --> Returned "Out of resource" (-2) instead of "Success" (0)
>> --
>> [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
>> initialize - aborting
>> [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
>> initialize - 

Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-13 Thread Kawashima, Takahiro
XPMEM moved to GitLab.

https://gitlab.com/hjelmn/xpmem

Thanks,
Takahiro Kawashima,
Fujitsu

> Hello Bert,
> 
> What OS are you running on your notebook?
> 
> If you are running Linux, and you have root access to your system,  then
> you should be able to resolve the Open SHMEM support issue by installing
> the XPMEM device driver on your system, and rebuilding UCX so it picks
> up XPMEM support.
> 
> The source code is on GitHub:
> 
> https://github.com/hjelmn/xpmem
> 
> Some instructions on how to build the xpmem device driver are at
> 
> https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM
> 
> You will need to install the kernel source and symbols rpms on your
> system before building the xpmem device driver.
> 
> Hope this helps,
> 
> Howard
> 
> 
> Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users <
> users@lists.open-mpi.org>:
> 
> > Hi,
> >
> > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
> >  wrote:
> > >
> > > The Open MPI Team, representing a consortium of research, academic, and
> > > industry partners, is pleased to announce the release of Open MPI version
> > > 4.0.0.
> > >
> > > v4.0.0 is the start of a new release series for Open MPI.  Starting with
> > > this release, the OpenIB BTL supports only iWarp and RoCE by default.
> > > Starting with this release,  UCX is the preferred transport protocol
> > > for Infiniband interconnects. The embedded PMIx runtime has been updated
> > > to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
> > > release is ABI compatible with the 3.x release streams. There have been
> > numerous
> > > other bug fixes and performance improvements.
> > >
> > > Note that starting with Open MPI v4.0.0, prototypes for several
> > > MPI-1 symbols that were deleted in the MPI-3.0 specification
> > > (which was published in 2012) are no longer available by default in
> > > mpi.h. See the README for further details.
> > >
> > > Version 4.0.0 can be downloaded from the main Open MPI web site:
> > >
> > >   https://www.open-mpi.org/software/ompi/v4.0/
> > >
> > >
> > > 4.0.0 -- September, 2018
> > > 
> > >
> > > - OSHMEM updated to the OpenSHMEM 1.4 API.
> > > - Do not build OpenSHMEM layer when there are no SPMLs available.
> > >   Currently, this means the OpenSHMEM layer will only build if
> > >   a MXM or UCX library is found.
> >
> > so what is the most convenience way to get SHMEM working on a single
> > shared memory node (aka. notebook)? I just realized that I don't have
> > a SHMEM since Open MPI 3.0. But building with UCX does not help
> > either. I tried with UCX 1.4 but Open MPI SHMEM
> > still does not work:
> >
> > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
> > $ oshrun -np 2 ./shmem_hello_world-4.0.0
> > [1542109710.217344] [tudtug:27715:0] select.c:406  UCX  ERROR
> > no remote registered memory access transport to tudtug:27716:
> > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > mm/posix - Destination is unreachable, cma/cma - no put short
> > [1542109710.217344] [tudtug:27716:0] select.c:406  UCX  ERROR
> > no remote registered memory access transport to tudtug:27715:
> > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > mm/posix - Destination is unreachable, cma/cma - no put short
> > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > Error: add procs FAILED rc=-2
> > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > Error: add procs FAILED rc=-2
> > --
> > It looks like SHMEM_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during SHMEM_INIT; some of which are due to configuration or
> > environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open SHMEM
> > developer):
> >
> >   SPML add procs failed
> >   --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --
> > [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> > initialize - aborting
> > [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> > initialize - aborting
> > --
> > SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with 

Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-13 Thread Howard Pritchard
Hello Bert,

What OS are you running on your notebook?

If you are running Linux, and you have root access to your system,  then
you should be able to resolve the Open SHMEM support issue by installing
the XPMEM device driver on your system, and rebuilding UCX so it picks
up XPMEM support.

The source code is on GitHub:

https://github.com/hjelmn/xpmem

Some instructions on how to build the xpmem device driver are at

https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM

You will need to install the kernel source and symbols rpms on your
system before building the xpmem device driver.

Hope this helps,

Howard


Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users <
users@lists.open-mpi.org>:

> Hi,
>
> On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
>  wrote:
> >
> > The Open MPI Team, representing a consortium of research, academic, and
> > industry partners, is pleased to announce the release of Open MPI version
> > 4.0.0.
> >
> > v4.0.0 is the start of a new release series for Open MPI.  Starting with
> > this release, the OpenIB BTL supports only iWarp and RoCE by default.
> > Starting with this release,  UCX is the preferred transport protocol
> > for Infiniband interconnects. The embedded PMIx runtime has been updated
> > to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
> > release is ABI compatible with the 3.x release streams. There have been
> numerous
> > other bug fixes and performance improvements.
> >
> > Note that starting with Open MPI v4.0.0, prototypes for several
> > MPI-1 symbols that were deleted in the MPI-3.0 specification
> > (which was published in 2012) are no longer available by default in
> > mpi.h. See the README for further details.
> >
> > Version 4.0.0 can be downloaded from the main Open MPI web site:
> >
> >   https://www.open-mpi.org/software/ompi/v4.0/
> >
> >
> > 4.0.0 -- September, 2018
> > 
> >
> > - OSHMEM updated to the OpenSHMEM 1.4 API.
> > - Do not build OpenSHMEM layer when there are no SPMLs available.
> >   Currently, this means the OpenSHMEM layer will only build if
> >   a MXM or UCX library is found.
>
> so what is the most convenience way to get SHMEM working on a single
> shared memory node (aka. notebook)? I just realized that I don't have
> a SHMEM since Open MPI 3.0. But building with UCX does not help
> either. I tried with UCX 1.4 but Open MPI SHMEM
> still does not work:
>
> $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
> $ oshrun -np 2 ./shmem_hello_world-4.0.0
> [1542109710.217344] [tudtug:27715:0] select.c:406  UCX  ERROR
> no remote registered memory access transport to tudtug:27716:
> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> mm/posix - Destination is unreachable, cma/cma - no put short
> [1542109710.217344] [tudtug:27716:0] select.c:406  UCX  ERROR
> no remote registered memory access transport to tudtug:27715:
> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> mm/posix - Destination is unreachable, cma/cma - no put short
> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> Error: add procs FAILED rc=-2
> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> Error: add procs FAILED rc=-2
> --
> It looks like SHMEM_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during SHMEM_INIT; some of which are due to configuration or
> environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open SHMEM
> developer):
>
>   SPML add procs failed
>   --> Returned "Out of resource" (-2) instead of "Success" (0)
> --
> [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> initialize - aborting
> [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> initialize - aborting
> --
> SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode
> -1.
> --
> --
> A SHMEM process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> 

Re: [OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-13 Thread gilles
Raymond,

can you please compress and post your config.log ?


Cheers,

Gilles

- Original Message -
> I am trying  to build OpenMPI with Lustre support using PGI 18.7 on 
> CentOS 7.5 (1804).
> 
> It builds successfully with Intel compilers, but fails to find the 
> necessary  Lustre components with the PGI compiler.
> 
> I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can build 
> OpenMPI, but configure does not find the proper Lustre files.
> 
> Lustre is installed from current client RPMS, version 2.10.5
> 
> Include files are in /usr/include/lustre
> 
> When specifying --with-lustre, I get:
> 
> --- MCA component fs:lustre (m4 configuration macro)
> checking for MCA component fs:lustre compile mode... dso
> checking --with-lustre value... simple ok (unspecified value)
> looking for header without includes
> checking lustre/lustreapi.h usability... yes
> checking lustre/lustreapi.h presence... yes
> checking for lustre/lustreapi.h... yes
> checking for library containing llapi_file_create... -llustreapi
> checking if liblustreapi requires libnl v1 or v3...
> checking for required lustre data structures... no
> configure: error: Lustre support requested but not found. Aborting
> 
> 
> -- 
>   
>   Ray Muno
>   IT Manager
>   
> 
>University of Minnesota
>   Aerospace Engineering and Mechanics Mechanical Engineering
>   110 Union St. S.E.  111 Church Street SE
>   Minneapolis, MN 55455   Minneapolis, MN 55455
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-13 Thread Raymond Muno
I am trying  to build OpenMPI with Lustre support using PGI 18.7 on 
CentOS 7.5 (1804).


It builds successfully with Intel compilers, but fails to find the 
necessary  Lustre components with the PGI compiler.


I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can build 
OpenMPI, but configure does not find the proper Lustre files.


Lustre is installed from current client RPMS, version 2.10.5

Include files are in /usr/include/lustre

When specifying --with-lustre, I get:

--- MCA component fs:lustre (m4 configuration macro)
checking for MCA component fs:lustre compile mode... dso
checking --with-lustre value... simple ok (unspecified value)
looking for header without includes
checking lustre/lustreapi.h usability... yes
checking lustre/lustreapi.h presence... yes
checking for lustre/lustreapi.h... yes
checking for library containing llapi_file_create... -llustreapi
checking if liblustreapi requires libnl v1 or v3...
checking for required lustre data structures... no
configure: error: Lustre support requested but not found. Aborting


--
 
 Ray Muno

 IT Manager
 


  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 110 Union St. S.E.  111 Church Street SE
 Minneapolis, MN 55455   Minneapolis, MN 55455

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] One question about progression of operations in MPI

2018-11-13 Thread Weicheng Xue
 Hi,

I am a student whose research work includes using MPI and OpenACC to
accelerate our in-house research CFD code on multiple GPUs. I am having a
big issue related to the "progression of operations in MPI" and am thinking
your inputs can be very helpful.

 I am now testing the performance of overlapping communication and
computation for a code. Communication exists between hosts (CPUs) and
computations are done on devices (GPUs). However, in my case, the actual
communication always starts when the computations finish. Therefore, even
though I wrote my code in an overlapping way, there is no overlapping
because of the OpenMPI not supporting asynchronous progression. I found
that MPI often does progress (i.e. actually send or receive the data) only
if I am blocking in a call to MPI_Wait (Then no overlapping occurs at all).
My purpose is to use overlapping to hide communication latency and thus
improve the performance of my code. Is there a way you can suggest to me?
Thank you very much!

 I am now using PGI/17.5 compiler and openmpi/2.0.0. A 100 Gbps
EDR-Infiniband is used for MPI traffic. If I use "ompi_info", then info.
about the thread support is "Thread support: posix (MPI_THREAD_MULTIPLE:
yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib:
yes)".

Best Regards,

Weicheng Xue
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-13 Thread Bert Wesarg via users
Hi,

On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
 wrote:
>
> The Open MPI Team, representing a consortium of research, academic, and
> industry partners, is pleased to announce the release of Open MPI version
> 4.0.0.
>
> v4.0.0 is the start of a new release series for Open MPI.  Starting with
> this release, the OpenIB BTL supports only iWarp and RoCE by default.
> Starting with this release,  UCX is the preferred transport protocol
> for Infiniband interconnects. The embedded PMIx runtime has been updated
> to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
> release is ABI compatible with the 3.x release streams. There have been 
> numerous
> other bug fixes and performance improvements.
>
> Note that starting with Open MPI v4.0.0, prototypes for several
> MPI-1 symbols that were deleted in the MPI-3.0 specification
> (which was published in 2012) are no longer available by default in
> mpi.h. See the README for further details.
>
> Version 4.0.0 can be downloaded from the main Open MPI web site:
>
>   https://www.open-mpi.org/software/ompi/v4.0/
>
>
> 4.0.0 -- September, 2018
> 
>
> - OSHMEM updated to the OpenSHMEM 1.4 API.
> - Do not build OpenSHMEM layer when there are no SPMLs available.
>   Currently, this means the OpenSHMEM layer will only build if
>   a MXM or UCX library is found.

so what is the most convenience way to get SHMEM working on a single
shared memory node (aka. notebook)? I just realized that I don't have
a SHMEM since Open MPI 3.0. But building with UCX does not help
either. I tried with UCX 1.4 but Open MPI SHMEM
still does not work:

$ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
$ oshrun -np 2 ./shmem_hello_world-4.0.0
[1542109710.217344] [tudtug:27715:0] select.c:406  UCX  ERROR
no remote registered memory access transport to tudtug:27716:
self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
mm/posix - Destination is unreachable, cma/cma - no put short
[1542109710.217344] [tudtug:27716:0] select.c:406  UCX  ERROR
no remote registered memory access transport to tudtug:27715:
self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
mm/posix - Destination is unreachable, cma/cma - no put short
[tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
[tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
Error: add procs FAILED rc=-2
[tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
[tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
Error: add procs FAILED rc=-2
--
It looks like SHMEM_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during SHMEM_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open SHMEM
developer):

  SPML add procs failed
  --> Returned "Out of resource" (-2) instead of "Success" (0)
--
[tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
initialize - aborting
[tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
initialize - aborting
--
SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode -1.
--
--
A SHMEM process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

Local host: tudtug
PID:27715
--
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
oshrun detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[2212,1],1]
  Exit code:255
--
[tudtug:27710] 1 more process has sent help message
help-shmem-runtime.txt / 

[OMPI users] stack overflow in routine alloca() for Java programs in openmpi-master with pgcc-18.4

2018-11-13 Thread Siegmar Gross

Hi,

I've installed openmpi-v4.0.x-20180241-725f625 and
openmpi-master-201811100305-3dc1629 on my "SUSE Linux Enterprise
Server 12.3 (x86_64)" with pgcc-18.4. Unfortunately, I get the
following error for my Java programs for openmpi-master.

loki java 130 ompi_info | grep "Configure command line:"
  Configure command line: '--prefix=/usr/local/openmpi-master_64_pgcc' 
'--libdir=/usr/local/openmpi-master_64_pgcc/lib64' 
'--with-jdk-bindir=/usr/local/jdk-11/bin' 
'--with-jdk-headers=/usr/local/jdk-11/include' 'JAVA_HOME=/usr/local/jdk-11' 
'LDFLAGS=-m64 -Wl,-z -Wl,noexecstack -L/usr/local/pgi/linux86-64/18.4/lib 
-R/usr/local/pgi/linux86-64/18.4/lib' 'LIBS=-lpgm' 'CC=pgcc' 'CXX=pgc++' 
'FC=pgfortran' 'CFLAGS=-c11 -m64' 'CXXFLAGS=-m64' 'FCFLAGS=-m64' 'CPP=cpp' 
'CXXCPP=cpp' '--enable-mpi-cxx' '--enable-cxx-exceptions' '--enable-mpi-java' 
'--with-valgrind=/usr/local/valgrind' '--with-hwloc=internal' '--without-verbs' 
'--with-wrapper-cflags=-c11 -m64' '--with-wrapper-cxxflags=-m64' 
'--with-wrapper-fcflags=-m64' '--enable-debug'

loki java 131 mpijavac InitFinalizeMain.java
warning: [path] bad path element 
"/usr/local/openmpi-master_64_pgcc/lib64/shmem.jar": no such file or directory

1 warning
loki java 132 mpiexec java InitFinalizeMain
Error: in routine alloca() there is a
stack overflow: thread 0, max 8180KB, used 0KB, request 42B
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
Error: in routine alloca() there is a
stack overflow: thread 0, max 8180KB, used 0KB, request 42B
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:

  Process name: [[47273,1],4]
  Exit code:127
--
loki java 133



Everything works as expected for openmpi-4.0.x.

loki java 108 ompi_info | grep "Configure command line:"
  Configure command line: '--prefix=/usr/local/openmpi-4.0.0_64_pgcc' 
'--libdir=/usr/local/openmpi-4.0.0_64_pgcc/lib64' 
'--with-jdk-bindir=/usr/local/jdk-11/bin' 
'--with-jdk-headers=/usr/local/jdk-11/include' 'JAVA_HOME=/usr/local/jdk-11' 
'LDFLAGS=-m64 -Wl,-z -Wl,noexecstack -L/usr/local/pgi/linux86-64/18.4/lib 
-R/usr/local/pgi/linux86-64/18.4/lib' 'LIBS=-lpgm' 'CC=pgcc' 'CXX=pgc++' 
'FC=pgfortran' 'CFLAGS=-c11 -m64' 'CXXFLAGS=-m64' 'FCFLAGS=-m64' 'CPP=cpp' 
'CXXCPP=cpp' '--enable-mpi-cxx' '--enable-cxx-exceptions' '--enable-mpi-java' 
'--with-valgrind=/usr/local/valgrind' '--with-hwloc=internal' '--without-verbs' 
'--with-wrapper-cflags=-c11 -m64' '--with-wrapper-cxxflags=-m64' 
'--with-wrapper-fcflags=-m64' '--enable-debug'

loki java 109 mpijavac InitFinalizeMain.java
warning: [path] bad path element 
"/usr/local/openmpi-4.0.0_64_pgcc/lib64/shmem.jar": no such file or directory

1 warning
loki java 110 mpiexec java InitFinalizeMain
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
Hello!
loki java 111



I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
/* The program demonstrates how to initialize and finalize an
 * MPI environment.
 *
 * "mpijavac" and Java-bindings are available in "Open MPI
 * version 1.7.4" or newer.
 *
 *
 * Class file generation:
 *   mpijavac InitFinalizeMain.java
 *
 * Usage:
 *   mpiexec [parameters] java [parameters] InitFinalizeMain
 *
 * Examples:
 *   mpiexec java InitFinalizeMain
 *   mpiexec java -cp $HOME/mpi_classfiles InitFinalizeMain
 *
 *
 * File: InitFinalizeMain.java		Author: S. Gross
 * Date: 09.09.2013
 *
 */

import mpi.*;

public class InitFinalizeMain
{
  public static void main (String args[]) throws MPIException
  {
MPI.Init (args);
System.out.print ("Hello!\n");
MPI.Finalize ();
  }
}
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] ORTE_ERROR_LOG: Pack data mismatch for openmpi-v4.0.x and openmpi-master

2018-11-13 Thread Siegmar Gross

Hi,

I've installed openmpi-v4.0.x-20180241-725f625 and
openmpi-master-201811100305-3dc1629 on my "SUSE Linux Enterprise
Server 12.3 (x86_64)" with Sun C 5.15 (Oracle Developer Studio 12.6),
gcc-6.4.0, icc-19.x, and pgcc-18.4. Unfortunately, I still get the
following error for all compilers for both versions.

https://www.mail-archive.com/users@lists.open-mpi.org/msg32796.html
https://www.mail-archive.com/users@lists.open-mpi.org/msg32797.html

I'm also still unable to build openmpi-master with Sun C due to the
error that I reported some time ago.

https://www.mail-archive.com/users@lists.open-mpi.org/msg32816.html


I would be grateful, if somebody can fix the problems. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users