Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Gilles Gouaillardet
Ryan,

What filesystem are you running on ?

Open MPI defaults to the ompio component, except on Lustre filesystem
where ROMIO is used.
(if the issue is related to ROMIO, that can explain why you did not
see any difference,
in that case, you might want to try an other filesystem (local
filesystem or NFS for example)\


Cheers,

Gilles

On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski  wrote:
>
> I verified that it makes it through to a bash prompt, but I’m a little less 
> confident that something make test does doesn’t clear it. Any recommendation 
> for a way to verify?
>
> In any case, no change, unfortunately.
>
> Sent from my iPhone
>
> > On Feb 16, 2019, at 08:13, Gabriel, Edgar  wrote:
> >
> > What file system are you running on?
> >
> > I will look into this, but it might be later next week. I just wanted to 
> > emphasize that we are regularly running the parallel hdf5 tests with ompio, 
> > and I am not aware of any outstanding items that do not work (and are 
> > supposed to work). That being said, I run the tests manually, and not the 
> > 'make test' commands. Will have to check which tests are being run by that.
> >
> > Edgar
> >
> >> -Original Message-
> >> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> >> Gouaillardet
> >> Sent: Saturday, February 16, 2019 1:49 AM
> >> To: Open MPI Users 
> >> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> >> 3.1.3
> >>
> >> Ryan,
> >>
> >> Can you
> >>
> >> export OMPI_MCA_io=^ompio
> >>
> >> and try again after you made sure this environment variable is passed by 
> >> srun
> >> to the MPI tasks ?
> >>
> >> We have identified and fixed several issues specific to the (default) ompio
> >> component, so that could be a valid workaround until the next release.
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> Ryan Novosielski  wrote:
> >>> Hi there,
> >>>
> >>> Honestly don’t know which piece of this puzzle to look at or how to get 
> >>> more
> >> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
> >> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
> >> failing at the below point; I am using a value of RUNPARALLEL='srun --
> >> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
> >> properly configured.
> >>>
> >>> Thanks for any help you can provide.
> >>>
> >>> make[4]: Entering directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> 
> >>> Testing  t_mpi
> >>> 
> >>> t_mpi  Test Log
> >>> 
> >>> srun: job 84126610 queued and waiting for resources
> >>> srun: job 84126610 has been allocated resources
> >>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
> >>> 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
> >>> 0inputs+0outputs (0major+1529minor)pagefaults 0swaps
> >>> make[4]: *** [t_mpi.chkexe_] Error 1
> >>> make[4]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[3]: *** [build-check-p] Error 1
> >>> make[3]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[2]: *** [test] Error 2
> >>> make[2]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[1]: *** [check-am] Error 2
> >>> make[1]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make: *** [check-recursive] Error 1
> >>>
> >>> --
> >>> 
> >>> || \\UTGERS,   
> >>> |---*O*---
> >>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
> >>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
> >>> Campus
> >>> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
> >>> Newark
> >>>  `'
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] What's the right approach to run a singleton MPI+OpenMP process

2019-02-16 Thread Gilles Gouaillardet
Simone,

If you want to run a single MPI task, you can either
 - mpirun -np 1 ./a.out (this is the most standard option)
 - ./a.out (this is the singleton mode. Note a.out will fork&exec an
orted daemon under the hood, this is necessary for example if your app
will MPI_Comm_spawn().
 - OMPI_MCA_ess_singleton_isolated=1 ./a.out This is the singleton
mode, and no orted daemon is spawned. this is faster but the app will
abort if it invokes for example MPI_Comm_spawn().

mpirun does set an affinity mask by default. The default mask is one
core per task is you run 2 MPI tasks or less, and a NUMA domain
otherwise.

Since you run a single MPI task and the default affinity mask is not a
good fit, you can either run in singleton mode (and consider isolated
singleton if it is a fit), or mpirun --bind-to none ... -np 1.

Running one OpenMP thread per core vs hyperthread has to do with the
OpenMP runtime and not Open MPI.

Cheers,

Gilles

On Sun, Feb 17, 2019 at 12:15 PM Simone Atzeni  wrote:
>
> Hi,
>
>
>
> For testing purposes I run some MPI+OpenMP benchmarks with `mpirun -np 1 
> ./a.out`, and I am using OpenMPI 3.1.3.
>
> As far as I understand, `mpirun` sets an affinity mask, and the OpenMP 
> runtime (in my case the LLVM OpenMP RT) respects this mask and only sees 1 
> physical core.
>
> In my case, I am running in a POWER8 which has 8 logical cores per physical 
> core. The OpenMP runtime in this case creates always the max number of 
> available logical cores in the machines (160 in my machine), but because of 
> the mask in this case will create 8 threads.
>
> All this threads runs in the same physical cores making the program slower 
> than if it would run each thread in a different physical core.
>
>
>
> So my question is, what’s the right way to run a single MPI process such that 
> the OpenMP threads can run in different physical cores independently from the 
> mask set by mpirun?
>
>
>
> I know about the option `--bind-to none` and using that all the cores in the 
> system become available and the OpenMP runtime uses all of them.
>
> Otherwise, doing some web search I read that a singleton MPI program should 
> be executed with ` OMPI_MCA_ess_singleton_isolated=1 ./a.out` without 
> `mpirun` at all, but I couldn’t find a good explanation of it.
>
>
>
> Is there anyone that could clarify this?
>
>
>
> Thank you!
>
> Simone
>
>
>
> 
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] What's the right approach to run a singleton MPI+OpenMP process

2019-02-16 Thread Simone Atzeni
Hi,

For testing purposes I run some MPI+OpenMP benchmarks with `mpirun -np 1 
./a.out`, and I am using OpenMPI 3.1.3.
As far as I understand, `mpirun` sets an affinity mask, and the OpenMP runtime 
(in my case the LLVM OpenMP RT) respects this mask and only sees 1 physical 
core.
In my case, I am running in a POWER8 which has 8 logical cores per physical 
core. The OpenMP runtime in this case creates always the max number of 
available logical cores in the machines (160 in my machine), but because of the 
mask in this case will create 8 threads.
All this threads runs in the same physical cores making the program slower than 
if it would run each thread in a different physical core.

So my question is, what's the right way to run a single MPI process such that 
the OpenMP threads can run in different physical cores independently from the 
mask set by mpirun?

I know about the option `--bind-to none` and using that all the cores in the 
system become available and the OpenMP runtime uses all of them.
Otherwise, doing some web search I read that a singleton MPI program should be 
executed with ` OMPI_MCA_ess_singleton_isolated=1 ./a.out` without `mpirun` at 
all, but I couldn't find a good explanation of it.

Is there anyone that could clarify this?

Thank you!
Simone


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Ryan Novosielski
I verified that it makes it through to a bash prompt, but I’m a little less 
confident that something make test does doesn’t clear it. Any recommendation 
for a way to verify?

In any case, no change, unfortunately. 

Sent from my iPhone

> On Feb 16, 2019, at 08:13, Gabriel, Edgar  wrote:
> 
> What file system are you running on?
> 
> I will look into this, but it might be later next week. I just wanted to 
> emphasize that we are regularly running the parallel hdf5 tests with ompio, 
> and I am not aware of any outstanding items that do not work (and are 
> supposed to work). That being said, I run the tests manually, and not the 
> 'make test' commands. Will have to check which tests are being run by that.
> 
> Edgar
> 
>> -Original Message-
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
>> Gouaillardet
>> Sent: Saturday, February 16, 2019 1:49 AM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> Ryan,
>> 
>> Can you
>> 
>> export OMPI_MCA_io=^ompio
>> 
>> and try again after you made sure this environment variable is passed by srun
>> to the MPI tasks ?
>> 
>> We have identified and fixed several issues specific to the (default) ompio
>> component, so that could be a valid workaround until the next release.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Ryan Novosielski  wrote:
>>> Hi there,
>>> 
>>> Honestly don’t know which piece of this puzzle to look at or how to get more
>> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
>> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
>> failing at the below point; I am using a value of RUNPARALLEL='srun --
>> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
>> properly configured.
>>> 
>>> Thanks for any help you can provide.
>>> 
>>> make[4]: Entering directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> 
>>> Testing  t_mpi
>>> 
>>> t_mpi  Test Log
>>> 
>>> srun: job 84126610 queued and waiting for resources
>>> srun: job 84126610 has been allocated resources
>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
>>> 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
>>> 0inputs+0outputs (0major+1529minor)pagefaults 0swaps
>>> make[4]: *** [t_mpi.chkexe_] Error 1
>>> make[4]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[3]: *** [build-check-p] Error 1
>>> make[3]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[2]: *** [test] Error 2
>>> make[2]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[1]: *** [check-am] Error 2
>>> make[1]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make: *** [check-recursive] Error 1
>>> 
>>> --
>>> 
>>> || \\UTGERS,   
>>> |---*O*---
>>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
>>> Newark
>>>  `'
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Nathan Hjelm via users
Probably not. I think this is now fixed. Might be worth trying master to 
verify. 

> On Feb 16, 2019, at 7:01 AM, Bart Janssens  wrote:
> 
> Hi Gilles,
> 
> Thanks, that works (I had to put quotes around the ^rdma). Should I file a 
> github issue?
> 
> Cheers,
> 
> Bart
>> On 16 Feb 2019, 14:05 +0100, Gilles Gouaillardet 
>> , wrote:
>> Bart,
>> 
>> It looks like a bug that involves the osc/rdma component.
>> 
>> Meanwhile, you can
>> mpirun --mca osc ^rdma ...
>> 
>> Cheers,
>> 
>> Gilles
>> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Bart Janssens
Hi Gilles,

Thanks, that works (I had to put quotes around the ^rdma). Should I file a 
github issue?

Cheers,

Bart
On 16 Feb 2019, 14:05 +0100, Gilles Gouaillardet 
, wrote:
> Bart,
>
> It looks like a bug that involves the osc/rdma component.
>
> Meanwhile, you can
> mpirun --mca osc ^rdma ...
>
> Cheers,
>
> Gilles
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Gabriel, Edgar
What file system are you running on?

I will look into this, but it might be later next week. I just wanted to 
emphasize that we are regularly running the parallel hdf5 tests with ompio, and 
I am not aware of any outstanding items that do not work (and are supposed to 
work). That being said, I run the tests manually, and not the 'make test' 
commands. Will have to check which tests are being run by that.

Edgar

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> Gouaillardet
> Sent: Saturday, February 16, 2019 1:49 AM
> To: Open MPI Users 
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> Ryan,
> 
> Can you
> 
> export OMPI_MCA_io=^ompio
> 
> and try again after you made sure this environment variable is passed by srun
> to the MPI tasks ?
> 
> We have identified and fixed several issues specific to the (default) ompio
> component, so that could be a valid workaround until the next release.
> 
> Cheers,
> 
> Gilles
> 
> Ryan Novosielski  wrote:
> >Hi there,
> >
> >Honestly don’t know which piece of this puzzle to look at or how to get more
> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
> failing at the below point; I am using a value of RUNPARALLEL='srun --
> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
> properly configured.
> >
> >Thanks for any help you can provide.
> >
> >make[4]: Entering directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >
> >Testing  t_mpi
> >
> >t_mpi  Test Log
> >
> >srun: job 84126610 queued and waiting for resources
> >srun: job 84126610 has been allocated resources
> >srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
> >20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
> >0inputs+0outputs (0major+1529minor)pagefaults 0swaps
> >make[4]: *** [t_mpi.chkexe_] Error 1
> >make[4]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[3]: *** [build-check-p] Error 1
> >make[3]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[2]: *** [test] Error 2
> >make[2]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[1]: *** [check-am] Error 2
> >make[1]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make: *** [check-recursive] Error 1
> >
> >--
> >
> >|| \\UTGERS,  
> >|---*O*---
> >||_// the State   | Ryan Novosielski - novos...@rutgers.edu
> >|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> >||  \\of NJ   | Office of Advanced Research Computing - MSB C630, 
> >Newark
> >   `'
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Gilles Gouaillardet
Bart,

It  looks like a bug that involves the osc/rdma component.

Meanwhile, you can
mpirun --mca osc ^rdma ...

Cheers,

Gilles

On Sat, Feb 16, 2019 at 8:43 PM b...@bartjanssens.org
 wrote:
>
> Hi,
>
> Running the following test code on two processes:
>
> #include 
> #include 
> #include 
>
> #define N 2
>
> int main(int argc, char **argv)
> {
> int i, rank, num_procs, len, received[N], buf[N];
> MPI_Aint addrbuf[1], recvaddr[1];
> MPI_Win win, awin;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
>
> MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, &win);
> MPI_Win_attach(win, buf, sizeof(int)*N);
> MPI_Win_create(addrbuf, sizeof(MPI_Aint), sizeof(MPI_Aint), MPI_INFO_NULL, 
> MPI_COMM_WORLD, &awin);
>
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, awin);
> MPI_Get_address(buf, &addrbuf[0]);
> MPI_Win_unlock(rank,awin);
>
> if(rank == 0)
> {
> printf("Process %d is waiting for debugger attach\n", getpid());
> sleep(15);
> }
>
> MPI_Barrier(MPI_COMM_WORLD);
>
> if(rank == 0)
> {
> for(int r = 0; r != N; ++r)
> {
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, r, 0, awin);
> MPI_Get(recvaddr, 1, MPI_AINT, r, 0, 1, MPI_AINT, awin);
> MPI_Win_unlock(r, awin);
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, r, 0, win);
> MPI_Get(received, N, MPI_INT, r, recvaddr[0], N, MPI_INT, win);
> printf("First value from %d is %d\n", r, received[0]);
> MPI_Win_unlock(r, win);
> }
> }
>
> MPI_Barrier(MPI_COMM_WORLD);
>
> MPI_Win_free(&win);
> MPI_Finalize();
> return 0;
> }
>
>
> results in a crash with this backtrace (starting at the second MPI_Get line 
> in my code above):
>
> #0  mca_btl_vader_get_cma (btl=0x7f44888d0220 , endpoint=0x0, 
> local_address=0x74a13c18, remote_address=, 
> local_handle=0x0,
> remote_handle=, size=8, flags=0, order=255, 
> cbfunc=0x7f4488231250 , cbcontext=0x555d01e1c060, 
> cbdata=0x0) at btl_vader_get.c:95
> #1  0x7f44882308c1 in ompi_osc_rdma_get_contig 
> (sync=sync@entry=0x555d01e1be90, peer=peer@entry=0x555d01e16f10, 
> source_address=,
> source_address@entry=140737297595424, 
> source_handle=source_handle@entry=0x7f448a747180, target_buffer= out>, target_buffer@entry=0x74a13c18, size=size@entry=8,
> request=) at osc_rdma_comm.c:698
> #2  0x7f44882354b6 in ompi_osc_rdma_master (alloc_reqs=true, 
> rdma_fn=0x7f4488230610 , max_rdma_len= out>, request=0x555d01e1c060,
> remote_datatype=0x555d0004a2c0 , remote_count= out>, remote_handle=0x7f448a747180, remote_address=, 
> peer=,
> local_datatype=0x555d0004a2c0 , local_count= out>, local_address=0x74a13c18, sync=0x555d01e1be90) at 
> osc_rdma_comm.c:349
> #3  ompi_osc_rdma_get_w_req (request=0x0, source_datatype=0x555d0004a2c0 
> , source_count=, source_disp=, 
> peer=,
> origin_datatype=0x555d0004a2c0 , origin_count= out>, origin_addr=0x74a13c18, sync=0x555d01e1be90) at osc_rdma_comm.c:803
> #4  ompi_osc_rdma_get (origin_addr=0x74a13c18, origin_count= out>, origin_datatype=0x555d0004a2c0 , source_rank= out>,
> source_disp=, source_count=, 
> source_datatype=0x555d0004a2c0 , win=0x555d01e0aae0) at 
> osc_rdma_comm.c:880
> #5  0x7f448b404b6b in PMPI_Get (origin_addr=0x74a13c18, 
> origin_count=2, origin_datatype=0x555d0004a2c0 , 
> target_rank=,
> target_disp=, target_count=, 
> target_datatype=0x555d0004a2c0 , win=0x555d01e0aae0) at 
> pget.c:81
> #6  0x555d00047430 in main (argc=1, argv=0x74a13d18) at 
> onesided_crash_report.c:41
>
> On OpenMPI 3.1.3 the code works fine. Am I doing something wrong, or is this 
> a bug?
>
> Kind regards,
>
> Bart ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread bart

Hi,

Running the following test code on two processes:
 #include #include #include  #define N 2 int main(int 
argc, char **argv){int i, rank, num_procs, len, received[N], buf[N];MPI_Aint 
addrbuf[1], recvaddr[1]; MPI_Win win, awin; MPI_Init(&argc, 
&argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, 
&num_procs); MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, 
&win);MPI_Win_attach(win, buf, sizeof(int)*N);MPI_Win_create(addrbuf, 
sizeof(MPI_Aint), sizeof(MPI_Aint), MPI_INFO_NULL, MPI_COMM_WORLD, &awin); 
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, awin);MPI_Get_address(buf, 
&addrbuf[0]);MPI_Win_unlock(rank,awin); if(rank == 0){printf("Process %d is 
waiting for debugger attach\n", getpid());sleep(15);} 
MPI_Barrier(MPI_COMM_WORLD); if(rank == 0){for(int r = 0; r != N; 
++r){MPI_Win_lock(MPI_LOCK_EXCLUSIVE, r, 0, awin);MPI_Get(recvaddr, 1, 
MPI_AINT, r, 0, 1, MPI_AINT, awin);MPI_Win_unlock(r, 
awin);MPI_Win_lock(MPI_LOCK_EXCLUSIVE, r, 0, win);MPI_Get(received, N, MPI_INT, 
r, recvaddr[0], N, MPI_INT, win);printf("First value from %d is %d\n", r, 
received[0]);MPI_Win_unlock(r, win);}} MPI_Barrier(MPI_COMM_WORLD); 
MPI_Win_free(&win);MPI_Finalize();return 0;}

results in a crash with this backtrace (starting at the second MPI_Get line in 
my code above):

#0  mca_btl_vader_get_cma (btl=0x7f44888d0220 , endpoint=0x0, 
local_address=0x74a13c18, remote_address=, local_handle=0x0,
    remote_handle=, size=8, flags=0, order=255, 
cbfunc=0x7f4488231250 , cbcontext=0x555d01e1c060, 
cbdata=0x0) at btl_vader_get.c:95
#1  0x7f44882308c1 in ompi_osc_rdma_get_contig 
(sync=sync@entry=0x555d01e1be90, peer=peer@entry=0x555d01e16f10, 
source_address=,
    source_address@entry=140737297595424, 
source_handle=source_handle@entry=0x7f448a747180, target_buffer=, target_buffer@entry=0x74a13c18, size=size@entry=8,
    request=) at osc_rdma_comm.c:698
#2  0x7f44882354b6 in ompi_osc_rdma_master (alloc_reqs=true, 
rdma_fn=0x7f4488230610 , max_rdma_len=, request=0x555d01e1c060,
    remote_datatype=0x555d0004a2c0 , remote_count=, remote_handle=0x7f448a747180, remote_address=, 
peer=,
    local_datatype=0x555d0004a2c0 , local_count=, 
local_address=0x74a13c18, sync=0x555d01e1be90) at osc_rdma_comm.c:349
#3  ompi_osc_rdma_get_w_req (request=0x0, source_datatype=0x555d0004a2c0 
, source_count=, source_disp=, 
peer=,
    origin_datatype=0x555d0004a2c0 , origin_count=, origin_addr=0x74a13c18, sync=0x555d01e1be90) at osc_rdma_comm.c:803
#4  ompi_osc_rdma_get (origin_addr=0x74a13c18, origin_count=, origin_datatype=0x555d0004a2c0 , source_rank=,
    source_disp=, source_count=, 
source_datatype=0x555d0004a2c0 , win=0x555d01e0aae0) at 
osc_rdma_comm.c:880
#5  0x7f448b404b6b in PMPI_Get (origin_addr=0x74a13c18, origin_count=2, 
origin_datatype=0x555d0004a2c0 , target_rank=,
    target_disp=, target_count=, 
target_datatype=0x555d0004a2c0 , win=0x555d01e0aae0) at pget.c:81
#6  0x555d00047430 in main (argc=1, argv=0x74a13d18) at 
onesided_crash_report.c:41

On OpenMPI 3.1.3 the code works fine. Am I doing something wrong, or is this a 
bug?

Kind regards,

Bart
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users