Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm
Fixed on master. The fix will be in 2.0.2 but you can apply it to 2.0.0 or 2.0.1:https://github.com/open-mpi/ompi/commit/e53de7ecbe9f034ab92c832330089cf7065181dc.patch-NathanOn Aug 25, 2016, at 07:31 AM, Joseph Schuchart  wrote:Gilles,Thanks for your fast reply. I did some last minute changes to the example code and didn't fully check the consistency of the output. Also, thanks for pointing out the mistake in computing the neighbor rank. I am attaching a fixed version. Best JosephOn 08/25/2016 03:11 PM, Gilles Gouaillardet wrote:Joseph,at first glance, there is a memory corruption (!)the first printf should be 0 -> 100, instead of 0 -> 3200this is very odd because nelems is const, and the compiler might not even allocate this variable.I also noted some counter intuitive stuff in your test program(which still looks valid to me)neighbor = (rank +1) / size;should it beneighbor = (rank + 1) % size;instead ?the first loop isfor (elem=0; elem < nelems-1; elem++) ...it could befor (elem=0; elem < nelems; elem++) ...the second loop uses disp_set, and I guess you meant to use disp_set2I will try to reproduce this crash.which compiler (vendor and version) are you using ?which compiler options do you pass to mpicc ?Cheers,Gilles On Thursday, August 25, 2016, Joseph Schuchart  wrote:All,  It seems there is a regression in the handling of dynamic windows between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works fine with Open MPI 1.8.3 and fail with version 2.0.0 with the following output:  === [0] MPI_Get 0 -> 3200 on first memory region [cl3fr1:7342] *** An error occurred in MPI_Get [cl3fr1:7342] *** reported by process [908197889,0] [cl3fr1:7342] *** on win rdma window 3 [cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range [cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort, [cl3fr1:7342] ***    and potentially your MPI job) ===  Expected output is: === [0] MPI_Get 0 -> 100 on first memory region: [0] Done. [0] MPI_Get 0 -> 100 on second memory region: [0] Done. ===  The code allocates a dynamic window and attaches two memory regions to it before accessing both memory regions using MPI_Get. With Open MPI 2.0.0, only access to the both memory regions fails. Access to the first memory region only succeeds if the second memory region is not attached. With Open MPI 1.10.3, all MPI operations succeed.  Please let me know if you need any additional information or think that my code example is not standard compliant.  Best regards Joseph   --  Dipl.-Inf. Joseph Schuchart High Performance Computing Center Stuttgart (HLRS) Nobelstr. 19 D-70569 Stuttgart  Tel.: +49(0)711-68565890 Fax: +49(0)711-6856832 E-Mail: schuch...@hlrs.de ___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users-- 
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de
___users mailing listusers@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users/*
 * mpi_dynamic_win.cc
 *
 *  Created on: Aug 24, 2016
 *  Author: joseph
 */

#include 
#include 
#include 

static int allocate_shared(size_t bufsize, MPI_Win win, MPI_Aint *disp_set) {
  int ret;
  char *sub_mem;
  MPI_Aint disp;

  sub_mem = malloc(bufsize * sizeof(char));

  /* Attach the allocated shared memory to the dynamic window */
  ret = MPI_Win_attach(win, sub_mem, bufsize);

  if (ret != MPI_SUCCESS) {
printf("MPI_Win_attach failed!\n");
return -1;
  }

  /* Get the local address */
  ret = MPI_Get_address(sub_mem, );

  if (ret != MPI_SUCCESS) {
printf("MPI_Get_address failed!\n");
return -1;
  }

  /* Publish addresses */
  ret = MPI_Allgather(, 1, MPI_AINT, disp_set, 1, MPI_AINT, MPI_COMM_WORLD);

  if (ret != MPI_SUCCESS) {
printf("MPI_Allgather failed!\n");
return -1;
  }

  return 0;
}

int main(int argc, char **argv)
{
  MPI_Win win;
  const size_t nelems = 10*10;
  const size_t bufsize = nelems * sizeof(double);
  MPI_Aint   *disp_set, *disp_set2;
  int rank, size;

  double buf[nelems];

  MPI_Init(, );
  MPI_Comm_rank(MPI_COMM_WORLD, );
  MPI_Comm_size(MPI_COMM_WORLD, );

  disp_set  = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));
  disp_set2 = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));

  int ret = MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, );
  if (ret != MPI_SUCCESS) {
printf("MPI_Win_create_dynamic failed!\n");
exit(1);
  }

  
  MPI_Win_lock_all (0, win);

  /* Allocate two shared windows */
  allocate_shared(bufsize, win, disp_set);  
  allocate_shared(bufsize, win, disp_set2);  

  /* Initiate a get */
  {
int elem;
int neighbor = (rank + 1) % size;
if (rank == 0) printf("[%i] MPI_Get 0 -> %zu on first memory region: \n", rank, nelems);
for 

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm

There is a bug in the code that keeps the dynamic regions sorted. Should have 
it fixed shortly.

-Nathan

On Aug 25, 2016, at 07:46 AM, Christoph Niethammer  wrote:

Hello,

The Error is not 100% reproducible for me every time but seems to disappear 
entirely if one excludes
-mca osc ^rdma
or
-mca btl ^openib
component.

The error is present in 2.0.0 and also 2.0.1rc1.

Best
Christoph Niethammer



- Original Message -
From: "Joseph Schuchart" 
To: users@lists.open-mpi.org
Sent: Thursday, August 25, 2016 2:07:17 PM
Subject: [OMPI users] Regression: multiple memory regions in dynamic windows

All,

It seems there is a regression in the handling of dynamic windows 
between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works 
fine with Open MPI 1.8.3 and fail with version 2.0.0 with the following 
output:


===
[0] MPI_Get 0 -> 3200 on first memory region
[cl3fr1:7342] *** An error occurred in MPI_Get
[cl3fr1:7342] *** reported by process [908197889,0]
[cl3fr1:7342] *** on win rdma window 3
[cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now 
abort,

[cl3fr1:7342] *** and potentially your MPI job)
===

Expected output is:
===
[0] MPI_Get 0 -> 100 on first memory region:
[0] Done.
[0] MPI_Get 0 -> 100 on second memory region:
[0] Done.
===

The code allocates a dynamic window and attaches two memory regions to 
it before accessing both memory regions using MPI_Get. With Open MPI 
2.0.0, only access to the both memory regions fails. Access to the first 
memory region only succeeds if the second memory region is not attached. 
With Open MPI 1.10.3, all MPI operations succeed.


Please let me know if you need any additional information or think that 
my code example is not standard compliant.


Best regards
Joseph


--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Howard Pritchard
Hi Joseph,

Thanks for reporting this problem.

There's an issue now (#2012)
https://github.com/open-mpi/ompi/issues/2012

to track this.

Howard


2016-08-25 7:44 GMT-06:00 Christoph Niethammer :

> Hello,
>
> The Error is not 100% reproducible for me every time but seems to
> disappear entirely if one excludes
> -mca osc ^rdma
> or
> -mca btl ^openib
> component.
>
> The error is present in 2.0.0 and also 2.0.1rc1.
>
> Best
> Christoph Niethammer
>
>
>
> - Original Message -
> From: "Joseph Schuchart" 
> To: users@lists.open-mpi.org
> Sent: Thursday, August 25, 2016 2:07:17 PM
> Subject: [OMPI users] Regression: multiple memory regions in dynamic
> windows
>
> All,
>
> It seems there is a regression in the handling of dynamic windows
> between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works
> fine with Open MPI 1.8.3 and fail with version 2.0.0 with the following
> output:
>
> ===
> [0] MPI_Get 0 -> 3200 on first memory region
> [cl3fr1:7342] *** An error occurred in MPI_Get
> [cl3fr1:7342] *** reported by process [908197889,0]
> [cl3fr1:7342] *** on win rdma window 3
> [cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
> [cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now
> abort,
> [cl3fr1:7342] ***and potentially your MPI job)
> ===
>
> Expected output is:
> ===
> [0] MPI_Get 0 -> 100 on first memory region:
> [0] Done.
> [0] MPI_Get 0 -> 100 on second memory region:
> [0] Done.
> ===
>
> The code allocates a dynamic window and attaches two memory regions to
> it before accessing both memory regions using MPI_Get. With Open MPI
> 2.0.0, only access to the both memory regions fails. Access to the first
> memory region only succeeds if the second memory region is not attached.
> With Open MPI 1.10.3, all MPI operations succeed.
>
> Please let me know if you need any additional information or think that
> my code example is not standard compliant.
>
> Best regards
> Joseph
>
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuch...@hlrs.de
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread Jeff Squyres (jsquyres)
The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it 
wasn't in last night's tarball.



> On Aug 25, 2016, at 10:59 AM, r...@open-mpi.org wrote:
> 
> ??? Weird - can you send me an updated output of that last test we ran?
> 
>> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang  wrote:
>> 
>> Hi Ralph,
>> 
>> I saw the pull request and did a test with v2.0.1rc1, but the problem 
>> persists. Any ideas?
>> 
>> Thanks,
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users  on behalf of 
>> r...@open-mpi.org 
>> Sent: Wednesday, August 24, 2016 1:27:28 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Bingo - found it, fix submitted and hope to get it into 2.0.1
>> 
>> Thanks for the assist!
>> Ralph
>> 
>> 
>>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang  wrote:
>>> 
>>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>>> some useful information.
>>> 
>>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
>>> same node.
>>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
>>> debug_info.txt
>>> 
>>> The debug_info.txt is attached. 
>>> 
>>> Dr. Jingchao Zhang
>>> Holland Computing Center
>>> University of Nebraska-Lincoln
>>> 402-472-6400
>>> From: users  on behalf of 
>>> r...@open-mpi.org 
>>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>  
>>> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
>>> I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
>>> 
>>> Can you configure OMPI with --enable-debug, and then run the test again 
>>> with --mca iof_base_verbose 100? It will hopefully tell us something about 
>>> why the IO subsystem is stuck.
>>> 
>>> 
 On Aug 24, 2016, at 8:46 AM, Jingchao Zhang  wrote:
 
 Hi Ralph,
 
 For our tests, rank 0 is always on the same node with mpirun. I just 
 tested mpirun with -nolocal and it still hangs. 
 
 Information on shell and OS
 $ echo $0
 -bash
 
 $ lsb_release -a
 LSB Version:
 :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
 Distributor ID: Scientific
 Description:Scientific Linux release 6.8 (Carbon)
 Release:6.8
 Codename:   Carbon
 
 $ uname -a
 Linux login.crane.hcc.unl.edu 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 
 11:25:51 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux
 
 
 Dr. Jingchao Zhang
 Holland Computing Center
 University of Nebraska-Lincoln
 402-472-6400
 From: users  on behalf of 
 r...@open-mpi.org 
 Sent: Tuesday, August 23, 2016 8:14:48 PM
 To: Open MPI Users
 Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
  
 Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node 
 on my cluster. I’ll give it a try.
 
 Jingchao: is rank 0 on the node with mpirun, or on a remote node?
 
 
> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet  
> wrote:
> 
> Ralph,
> 
> did you run task 0 and mpirun on different nodes ?
> 
> i observed some random hangs, though i cannot blame openmpi 100% yet
> 
> Cheers,
> 
> Gilles
> 
> On 8/24/2016 9:41 AM, r...@open-mpi.org wrote:
>> Very strange. I cannot reproduce it as I’m able to run any number of 
>> nodes and procs, pushing over 100Mbytes thru without any problem.
>> 
>> Which leads me to suspect that the issue here is with the tty interface. 
>> Can you tell me what shell and OS you are running?
>> 
>> 
>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang  wrote:
>>> 
>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>>> each node, I got the following
>>> 
>>> $ mpirun ./a.out < test.in
>>> Rank 2 has cleared MPI_Init
>>> Rank 4 has cleared MPI_Init
>>> Rank 7 has cleared MPI_Init
>>> Rank 8 has cleared MPI_Init
>>> Rank 0 has cleared MPI_Init
>>> Rank 5 has cleared MPI_Init
>>> Rank 6 has cleared MPI_Init
>>> Rank 9 has cleared MPI_Init
>>> Rank 1 has cleared MPI_Init
>>> Rank 16 has cleared MPI_Init
>>> Rank 19 has cleared MPI_Init
>>> Rank 10 has cleared MPI_Init
>>> Rank 11 has cleared MPI_Init
>>> Rank 12 has cleared MPI_Init
>>> Rank 13 has cleared MPI_Init
>>> Rank 14 has cleared MPI_Init

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread Jingchao Zhang
$ grep stdin_target orte/runtime/orte_globals.c
635:job->stdin_target = 0;


Recompiled with --enable-debug.


Same test case: 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
same node.

$ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
debug_info2.txt


The new debug_info2.txt file is attached.


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users  on behalf of r...@open-mpi.org 

Sent: Thursday, August 25, 2016 8:59:23 AM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

??? Weird - can you send me an updated output of that last test we ran?

On Aug 25, 2016, at 7:51 AM, Jingchao Zhang 
> wrote:

Hi Ralph,

I saw the pull request and did a test with v2.0.1rc1, but the problem persists. 
Any ideas?

Thanks,

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Wednesday, August 24, 2016 1:27:28 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Bingo - found it, fix submitted and hope to get it into 2.0.1

Thanks for the assist!
Ralph


On Aug 24, 2016, at 12:15 PM, Jingchao Zhang 
> wrote:

I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
iof_base_verbose 100. I also added -display-devel-map in case it provides some 
useful information.

Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the same 
node.
$ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
debug_info.txt

The debug_info.txt is attached.

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Wednesday, August 24, 2016 12:14:26 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m 
also using bash, but on CentOS-7, so I suspect the OS is the difference.

Can you configure OMPI with --enable-debug, and then run the test again with 
--mca iof_base_verbose 100? It will hopefully tell us something about why the 
IO subsystem is stuck.


On Aug 24, 2016, at 8:46 AM, Jingchao Zhang 
> wrote:

Hi Ralph,

For our tests, rank 0 is always on the same node with mpirun. I just tested 
mpirun with -nolocal and it still hangs.

Information on shell and OS

$ echo $0
-bash

$ lsb_release -a
LSB Version:
:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description:Scientific Linux release 6.8 (Carbon)
Release:6.8
Codename:   Carbon


$ uname -a
Linux login.crane.hcc.unl.edu 
2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
x86_64 GNU/Linux


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Tuesday, August 23, 2016 8:14:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on my 
cluster. I’ll give it a try.

Jingchao: is rank 0 on the node with mpirun, or on a remote node?


On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet 
> wrote:

Ralph,

did you run task 0 and mpirun on different nodes ?

i observed some random hangs, though i cannot blame openmpi 100% yet

Cheers,

Gilles

On 8/24/2016 9:41 AM, r...@open-mpi.org wrote:
Very strange. I cannot reproduce it as I’m able to run any number of nodes and 
procs, pushing over 100Mbytes thru without any problem.

Which leads me to suspect that the issue here is with the tty interface. Can 
you tell me what shell and OS you are running?


On Aug 23, 2016, at 3:25 PM, Jingchao Zhang 
> wrote:

Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores each 
node, I got the following

$ mpirun ./a.out < test.in
Rank 2 has cleared MPI_Init
Rank 4 has cleared MPI_Init
Rank 7 has cleared MPI_Init
Rank 8 

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread r...@open-mpi.org
??? Weird - can you send me an updated output of that last test we ran?

> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang  wrote:
> 
> Hi Ralph,
> 
> I saw the pull request and did a test with v2.0.1rc1, but the problem 
> persists. Any ideas?
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  > on behalf of r...@open-mpi.org 
>  >
> Sent: Wednesday, August 24, 2016 1:27:28 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Bingo - found it, fix submitted and hope to get it into 2.0.1
> 
> Thanks for the assist!
> Ralph
> 
> 
>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang > > wrote:
>> 
>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>> some useful information.
>> 
>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
>> same node.
>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
>> debug_info.txt
>> 
>> The debug_info.txt is attached. 
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users > > on behalf of r...@open-mpi.org 
>>  >
>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
>> I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
>> 
>> Can you configure OMPI with --enable-debug, and then run the test again with 
>> --mca iof_base_verbose 100? It will hopefully tell us something about why 
>> the IO subsystem is stuck.
>> 
>> 
>>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang >> > wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> For our tests, rank 0 is always on the same node with mpirun. I just tested 
>>> mpirun with -nolocal and it still hangs. 
>>> 
>>> Information on shell and OS
>>> $ echo $0
>>> -bash
>>> 
>>> $ lsb_release -a
>>> LSB Version:
>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>> Distributor ID: Scientific
>>> Description:Scientific Linux release 6.8 (Carbon)
>>> Release:6.8
>>> Codename:   Carbon
>>> 
>>> $ uname -a
>>> Linux login.crane.hcc.unl.edu  
>>> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
>>> x86_64 GNU/Linux
>>> 
>>> 
>>> Dr. Jingchao Zhang
>>> Holland Computing Center
>>> University of Nebraska-Lincoln
>>> 402-472-6400
>>> From: users >> > on behalf of r...@open-mpi.org 
>>>  >
>>> Sent: Tuesday, August 23, 2016 8:14:48 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>  
>>> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node 
>>> on my cluster. I’ll give it a try.
>>> 
>>> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
>>> 
>>> 
 On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet > wrote:
 
 Ralph,
 
 did you run task 0 and mpirun on different nodes ?
 
 i observed some random hangs, though i cannot blame openmpi 100% yet
 
 Cheers,
 
 Gilles
 
 On 8/24/2016 9:41 AM, r...@open-mpi.org  wrote:
> Very strange. I cannot reproduce it as I’m able to run any number of 
> nodes and procs, pushing over 100Mbytes thru without any problem.
> 
> Which leads me to suspect that the issue here is with the tty interface. 
> Can you tell me what shell and OS you are running?
> 
> 
>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang > > wrote:
>> 
>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>> each node, I got the following
>> 
>> $ mpirun ./a.out < test.in
>> Rank 2 has cleared MPI_Init
>> Rank 4 has cleared MPI_Init
>> Rank 7 has cleared MPI_Init
>> Rank 8 has cleared MPI_Init
>> Rank 0 has cleared MPI_Init
>> Rank 5 has cleared MPI_Init
>> Rank 6 has cleared MPI_Init
>> Rank 9 has cleared MPI_Init
>> Rank 1 has cleared MPI_Init
>> Rank 16 has cleared MPI_Init
>> Rank 19 has cleared MPI_Init
>> Rank 10 

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread Jingchao Zhang
Hi Ralph,


I saw the pull request and did a test with v2.0.1rc1, but the problem persists. 
Any ideas?


Thanks,


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users  on behalf of r...@open-mpi.org 

Sent: Wednesday, August 24, 2016 1:27:28 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Bingo - found it, fix submitted and hope to get it into 2.0.1

Thanks for the assist!
Ralph


On Aug 24, 2016, at 12:15 PM, Jingchao Zhang 
> wrote:

I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
iof_base_verbose 100. I also added -display-devel-map in case it provides some 
useful information.

Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the same 
node.
$ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
debug_info.txt

The debug_info.txt is attached.

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Wednesday, August 24, 2016 12:14:26 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m 
also using bash, but on CentOS-7, so I suspect the OS is the difference.

Can you configure OMPI with --enable-debug, and then run the test again with 
--mca iof_base_verbose 100? It will hopefully tell us something about why the 
IO subsystem is stuck.


On Aug 24, 2016, at 8:46 AM, Jingchao Zhang 
> wrote:

Hi Ralph,

For our tests, rank 0 is always on the same node with mpirun. I just tested 
mpirun with -nolocal and it still hangs.

Information on shell and OS

$ echo $0
-bash

$ lsb_release -a
LSB Version:
:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description:Scientific Linux release 6.8 (Carbon)
Release:6.8
Codename:   Carbon


$ uname -a
Linux login.crane.hcc.unl.edu 
2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
x86_64 GNU/Linux


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Tuesday, August 23, 2016 8:14:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0

Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on my 
cluster. I’ll give it a try.

Jingchao: is rank 0 on the node with mpirun, or on a remote node?


On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet 
> wrote:

Ralph,

did you run task 0 and mpirun on different nodes ?

i observed some random hangs, though i cannot blame openmpi 100% yet

Cheers,

Gilles

On 8/24/2016 9:41 AM, r...@open-mpi.org wrote:
Very strange. I cannot reproduce it as I’m able to run any number of nodes and 
procs, pushing over 100Mbytes thru without any problem.

Which leads me to suspect that the issue here is with the tty interface. Can 
you tell me what shell and OS you are running?


On Aug 23, 2016, at 3:25 PM, Jingchao Zhang 
> wrote:

Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores each 
node, I got the following

$ mpirun ./a.out < test.in
Rank 2 has cleared MPI_Init
Rank 4 has cleared MPI_Init
Rank 7 has cleared MPI_Init
Rank 8 has cleared MPI_Init
Rank 0 has cleared MPI_Init
Rank 5 has cleared MPI_Init
Rank 6 has cleared MPI_Init
Rank 9 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 16 has cleared MPI_Init
Rank 19 has cleared MPI_Init
Rank 10 has cleared MPI_Init
Rank 11 has cleared MPI_Init
Rank 12 has cleared MPI_Init
Rank 13 has cleared MPI_Init
Rank 14 has cleared MPI_Init
Rank 15 has cleared MPI_Init
Rank 17 has cleared MPI_Init
Rank 18 has cleared MPI_Init
Rank 3 has cleared MPI_Init

then it just hanged.

--Jingchao

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

From: users 
> on 
behalf of r...@open-mpi.org 
>
Sent: Tuesday, August 23, 2016 4:03:07 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with 

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Christoph Niethammer
Hello,

The Error is not 100% reproducible for me every time but seems to disappear 
entirely if one excludes
-mca osc ^rdma
or
-mca btl ^openib
component.

The error is present in 2.0.0 and also 2.0.1rc1.

Best
Christoph Niethammer



- Original Message -
From: "Joseph Schuchart" 
To: users@lists.open-mpi.org
Sent: Thursday, August 25, 2016 2:07:17 PM
Subject: [OMPI users] Regression: multiple memory regions in dynamic windows

All,

It seems there is a regression in the handling of dynamic windows 
between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works 
fine with Open MPI 1.8.3 and fail with version 2.0.0 with the following 
output:

===
[0] MPI_Get 0 -> 3200 on first memory region
[cl3fr1:7342] *** An error occurred in MPI_Get
[cl3fr1:7342] *** reported by process [908197889,0]
[cl3fr1:7342] *** on win rdma window 3
[cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now 
abort,
[cl3fr1:7342] ***and potentially your MPI job)
===

Expected output is:
===
[0] MPI_Get 0 -> 100 on first memory region:
[0] Done.
[0] MPI_Get 0 -> 100 on second memory region:
[0] Done.
===

The code allocates a dynamic window and attaches two memory regions to 
it before accessing both memory regions using MPI_Get. With Open MPI 
2.0.0, only access to the both memory regions fails. Access to the first 
memory region only succeeds if the second memory region is not attached. 
With Open MPI 1.10.3, all MPI operations succeed.

Please let me know if you need any additional information or think that 
my code example is not standard compliant.

Best regards
Joseph


-- 
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Joseph Schuchart

Gilles,

Thanks for your fast reply. I did some last minute changes to the 
example code and didn't fully check the consistency of the output. Also, 
thanks for pointing out the mistake in computing the neighbor rank. I am 
attaching a fixed version.


Best
Joseph

On 08/25/2016 03:11 PM, Gilles Gouaillardet wrote:

Joseph,

at first glance, there is a memory corruption (!)
the first printf should be 0 -> 100, instead of 0 -> 3200

this is very odd because nelems is const, and the compiler might not 
even allocate this variable.


I also noted some counter intuitive stuff in your test program
(which still looks valid to me)

neighbor = (rank +1) / size;
should it be
neighbor = (rank + 1) % size;
instead ?

the first loop is
for (elem=0; elem < nelems-1; elem++) ...
it could be
for (elem=0; elem < nelems; elem++) ...

the second loop uses disp_set, and I guess you meant to use disp_set2

I will try to reproduce this crash.
which compiler (vendor and version) are you using ?
which compiler options do you pass to mpicc ?


Cheers,

Gilles

On Thursday, August 25, 2016, Joseph Schuchart > wrote:


All,

It seems there is a regression in the handling of dynamic windows
between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that
works fine with Open MPI 1.8.3 and fail with version 2.0.0 with
the following output:

===
[0] MPI_Get 0 -> 3200 on first memory region
[cl3fr1:7342] *** An error occurred in MPI_Get
[cl3fr1:7342] *** reported by process [908197889,0]
[cl3fr1:7342] *** on win rdma window 3
[cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will
now abort,
[cl3fr1:7342] ***and potentially your MPI job)
===

Expected output is:
===
[0] MPI_Get 0 -> 100 on first memory region:
[0] Done.
[0] MPI_Get 0 -> 100 on second memory region:
[0] Done.
===

The code allocates a dynamic window and attaches two memory
regions to it before accessing both memory regions using MPI_Get.
With Open MPI 2.0.0, only access to the both memory regions fails.
Access to the first memory region only succeeds if the second
memory region is not attached. With Open MPI 1.10.3, all MPI
operations succeed.

Please let me know if you need any additional information or think
that my code example is not standard compliant.

Best regards
Joseph


-- 
Dipl.-Inf. Joseph Schuchart

High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

/*
 * mpi_dynamic_win.cc
 *
 *  Created on: Aug 24, 2016
 *  Author: joseph
 */

#include 
#include 
#include 

static int allocate_shared(size_t bufsize, MPI_Win win, MPI_Aint *disp_set) {
  int ret;
  char *sub_mem;
  MPI_Aint disp;

  sub_mem = malloc(bufsize * sizeof(char));

  /* Attach the allocated shared memory to the dynamic window */
  ret = MPI_Win_attach(win, sub_mem, bufsize);

  if (ret != MPI_SUCCESS) {
printf("MPI_Win_attach failed!\n");
return -1;
  }

  /* Get the local address */
  ret = MPI_Get_address(sub_mem, );

  if (ret != MPI_SUCCESS) {
printf("MPI_Get_address failed!\n");
return -1;
  }

  /* Publish addresses */
  ret = MPI_Allgather(, 1, MPI_AINT, disp_set, 1, MPI_AINT, MPI_COMM_WORLD);

  if (ret != MPI_SUCCESS) {
printf("MPI_Allgather failed!\n");
return -1;
  }

  return 0;
}

int main(int argc, char **argv)
{
  MPI_Win win;
  const size_t nelems = 10*10;
  const size_t bufsize = nelems * sizeof(double);
  MPI_Aint   *disp_set, *disp_set2;
  int rank, size;

  double buf[nelems];

  MPI_Init(, );
  MPI_Comm_rank(MPI_COMM_WORLD, );
  MPI_Comm_size(MPI_COMM_WORLD, );

  disp_set  = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));
  disp_set2 = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));

  int ret = MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, );
  if (ret != MPI_SUCCESS) {
printf("MPI_Win_create_dynamic failed!\n");
exit(1);
  }

  
  MPI_Win_lock_all (0, win);

  /* Allocate two shared windows */
  allocate_shared(bufsize, win, disp_set);  
  allocate_shared(bufsize, win, disp_set2);  

  /* Initiate a get */
  {
int elem;
int neighbor = (rank + 1) % size;
if (rank == 0) printf("[%i] MPI_Get 0 -> %zu on first memory region: \n", rank, nelems);
for (elem = 0; elem < nelems; elem++) {
  MPI_Aint off = elem * sizeof(double);
  //MPI_Aint disp = MPI_Aint_add(disp_set[neighbor], off);
  

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Gilles Gouaillardet
Joseph,

I also noted the MPI_Info "alloc_shared_noncontig" is unused.
I do not know whether this is necessary or not, but if you do want to use
it, this should be used once with MPI_Win_create_dynamic

Cheers,

Gilles

On Thursday, August 25, 2016, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Joseph,
>
> at first glance, there is a memory corruption (!)
> the first printf should be 0 -> 100, instead of 0 -> 3200
>
> this is very odd because nelems is const, and the compiler might not even
> allocate this variable.
>
> I also noted some counter intuitive stuff in your test program
> (which still looks valid to me)
>
> neighbor = (rank +1) / size;
> should it be
> neighbor = (rank + 1) % size;
> instead ?
>
> the first loop is
> for (elem=0; elem < nelems-1; elem++) ...
> it could be
> for (elem=0; elem < nelems; elem++) ...
>
> the second loop uses disp_set, and I guess you meant to use disp_set2
>
> I will try to reproduce this crash.
> which compiler (vendor and version) are you using ?
> which compiler options do you pass to mpicc ?
>
>
> Cheers,
>
> Gilles
>
> On Thursday, August 25, 2016, Joseph Schuchart  > wrote:
>
>> All,
>>
>> It seems there is a regression in the handling of dynamic windows between
>> Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works fine with
>> Open MPI 1.8.3 and fail with version 2.0.0 with the following output:
>>
>> ===
>> [0] MPI_Get 0 -> 3200 on first memory region
>> [cl3fr1:7342] *** An error occurred in MPI_Get
>> [cl3fr1:7342] *** reported by process [908197889,0]
>> [cl3fr1:7342] *** on win rdma window 3
>> [cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
>> [cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now
>> abort,
>> [cl3fr1:7342] ***and potentially your MPI job)
>> ===
>>
>> Expected output is:
>> ===
>> [0] MPI_Get 0 -> 100 on first memory region:
>> [0] Done.
>> [0] MPI_Get 0 -> 100 on second memory region:
>> [0] Done.
>> ===
>>
>> The code allocates a dynamic window and attaches two memory regions to it
>> before accessing both memory regions using MPI_Get. With Open MPI 2.0.0,
>> only access to the both memory regions fails. Access to the first memory
>> region only succeeds if the second memory region is not attached. With Open
>> MPI 1.10.3, all MPI operations succeed.
>>
>> Please let me know if you need any additional information or think that
>> my code example is not standard compliant.
>>
>> Best regards
>> Joseph
>>
>>
>> --
>> Dipl.-Inf. Joseph Schuchart
>> High Performance Computing Center Stuttgart (HLRS)
>> Nobelstr. 19
>> D-70569 Stuttgart
>>
>> Tel.: +49(0)711-68565890
>> Fax: +49(0)711-6856832
>> E-Mail: schuch...@hlrs.de
>>
>>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Gilles Gouaillardet
Joseph,

at first glance, there is a memory corruption (!)
the first printf should be 0 -> 100, instead of 0 -> 3200

this is very odd because nelems is const, and the compiler might not even
allocate this variable.

I also noted some counter intuitive stuff in your test program
(which still looks valid to me)

neighbor = (rank +1) / size;
should it be
neighbor = (rank + 1) % size;
instead ?

the first loop is
for (elem=0; elem < nelems-1; elem++) ...
it could be
for (elem=0; elem < nelems; elem++) ...

the second loop uses disp_set, and I guess you meant to use disp_set2

I will try to reproduce this crash.
which compiler (vendor and version) are you using ?
which compiler options do you pass to mpicc ?


Cheers,

Gilles

On Thursday, August 25, 2016, Joseph Schuchart  wrote:

> All,
>
> It seems there is a regression in the handling of dynamic windows between
> Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works fine with
> Open MPI 1.8.3 and fail with version 2.0.0 with the following output:
>
> ===
> [0] MPI_Get 0 -> 3200 on first memory region
> [cl3fr1:7342] *** An error occurred in MPI_Get
> [cl3fr1:7342] *** reported by process [908197889,0]
> [cl3fr1:7342] *** on win rdma window 3
> [cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
> [cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now
> abort,
> [cl3fr1:7342] ***and potentially your MPI job)
> ===
>
> Expected output is:
> ===
> [0] MPI_Get 0 -> 100 on first memory region:
> [0] Done.
> [0] MPI_Get 0 -> 100 on second memory region:
> [0] Done.
> ===
>
> The code allocates a dynamic window and attaches two memory regions to it
> before accessing both memory regions using MPI_Get. With Open MPI 2.0.0,
> only access to the both memory regions fails. Access to the first memory
> region only succeeds if the second memory region is not attached. With Open
> MPI 1.10.3, all MPI operations succeed.
>
> Please let me know if you need any additional information or think that my
> code example is not standard compliant.
>
> Best regards
> Joseph
>
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuch...@hlrs.de
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Joseph Schuchart

All,

It seems there is a regression in the handling of dynamic windows 
between Open MPI 1.10.3 and 2.0.0. I am attaching a test case that works 
fine with Open MPI 1.8.3 and fail with version 2.0.0 with the following 
output:


===
[0] MPI_Get 0 -> 3200 on first memory region
[cl3fr1:7342] *** An error occurred in MPI_Get
[cl3fr1:7342] *** reported by process [908197889,0]
[cl3fr1:7342] *** on win rdma window 3
[cl3fr1:7342] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[cl3fr1:7342] *** MPI_ERRORS_ARE_FATAL (processes in this win will now 
abort,

[cl3fr1:7342] ***and potentially your MPI job)
===

Expected output is:
===
[0] MPI_Get 0 -> 100 on first memory region:
[0] Done.
[0] MPI_Get 0 -> 100 on second memory region:
[0] Done.
===

The code allocates a dynamic window and attaches two memory regions to 
it before accessing both memory regions using MPI_Get. With Open MPI 
2.0.0, only access to the both memory regions fails. Access to the first 
memory region only succeeds if the second memory region is not attached. 
With Open MPI 1.10.3, all MPI operations succeed.


Please let me know if you need any additional information or think that 
my code example is not standard compliant.


Best regards
Joseph


--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

/*
 * mpi_dynamic_win.cc
 *
 *  Created on: Aug 24, 2016
 *  Author: joseph
 */

#include 
#include 
#include 

static int allocate_shared(size_t bufsize, MPI_Win win, MPI_Aint *disp_set) {
  int ret;
  char *sub_mem;
  MPI_Aint disp;
  MPI_Info win_info;
  MPI_Info_create(_info);
  MPI_Info_set(win_info, "alloc_shared_noncontig", "true");

  sub_mem = malloc(bufsize * sizeof(char));

  /* Attach the allocated shared memory to the dynamic window */
  ret = MPI_Win_attach(win, sub_mem, bufsize);

  if (ret != MPI_SUCCESS) {
printf("MPI_Win_attach failed!\n");
return -1;
  }

  /* Get the local address */
  ret = MPI_Get_address(sub_mem, );

  if (ret != MPI_SUCCESS) {
printf("MPI_Get_address failed!\n");
return -1;
  }

  /* Publish addresses */
  ret = MPI_Allgather(, 1, MPI_AINT, disp_set, 1, MPI_AINT, MPI_COMM_WORLD);

  if (ret != MPI_SUCCESS) {
printf("MPI_Allgather failed!\n");
return -1;
  }

  MPI_Info_free(_info);

  return 0;
}

int main(int argc, char **argv)
{
  MPI_Win win;
  const size_t nelems = 10*10;
  const size_t bufsize = nelems * sizeof(double);
  MPI_Aint   *disp_set, *disp_set2;
  int rank, size;

  double buf[nelems];

  MPI_Init(, );
  MPI_Comm_rank(MPI_COMM_WORLD, );
  MPI_Comm_size(MPI_COMM_WORLD, );

  disp_set  = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));
  disp_set2 = (MPI_Aint*) malloc(size * sizeof(MPI_Aint));

  int ret = MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, );
  if (ret != MPI_SUCCESS) {
printf("MPI_Win_create_dynamic failed!\n");
exit(1);
  }

  
  MPI_Win_lock_all (0, win);

  /* Allocate two shared windows */
  allocate_shared(bufsize, win, disp_set);  
  allocate_shared(bufsize, win, disp_set2);  

  /* Initiate a get */
  {
int elem;
int neighbor = (rank + 1) / size;
if (rank == 0) printf("[%i] MPI_Get 0 -> %zu on first memory region: \n", rank, nelems);
for (elem = 0; elem < nelems -1; elem++) {
  MPI_Aint off = elem * sizeof(double);
  //MPI_Aint disp = MPI_Aint_add(disp_set[neighbor], off);
  MPI_Aint disp = disp_set[neighbor] + off;
  MPI_Get([elem], sizeof(double), MPI_BYTE, neighbor, disp, sizeof(double), MPI_BYTE, win);
}
MPI_Win_flush(neighbor, win);
if (rank == 0) printf("[%i] Done.\n", rank);
  }


  MPI_Barrier(MPI_COMM_WORLD);

  {
int elem;
int neighbor = (rank + 1) / size;
if (rank == 0) printf("[%i] MPI_Get 0 -> %zu on second memory region: \n", rank, nelems);
for (elem = 0; elem < nelems; elem++) {
  MPI_Aint off = elem * sizeof(double);
  //MPI_Aint disp = MPI_Aint_add(disp_set2[neighbor], off);
  MPI_Aint disp = disp_set[neighbor] + off;
  MPI_Get([elem], sizeof(double), MPI_BYTE, neighbor, disp, sizeof(double), MPI_BYTE, win);
}
MPI_Win_flush(neighbor, win);
if (rank == 0) printf("[%i] Done.\n", rank);
  }
  MPI_Barrier(MPI_COMM_WORLD);


  MPI_Win_unlock_all (win);

  free(disp_set);
  free(disp_set2);

  MPI_Finalize();
  return 0;
}

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users