Re: [OMPI users] shared memory zero size segment

2016-02-11 Thread Peter Wind
You can be right semantically. But also the sentence "the first address in the 
memory segment of process i is consecutive with the last address in the memory 
segment of process i - 1" is not easy to interpret correctly for a zero size 
segment.

There may be good reasons not to allocate the pointer for zero size segment. 
What I try to say is, that a new user reading the documentation, will not 
expect this behaviour before trying it out.
Couldn't a small sentence in the documentation, like "the pointer should not be 
used for zero size segments" clarify this?

Peter

- Original Message -
> 
> On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote:
> >I would add that the present situation is bound to give problems for
> >some
> >users.
> >It is natural to divide an array in segments, each process treating its
> >own segment, but needing to read adjacent segments too.
> >MPI_Win_allocate_shared seems to be designed for this.
> >This will work fine as long as no segment as size zero. It can also be
> >expected that most testing would be done with all segments larger than
> >zero.
> >The document adding "size = 0 is valid", would also make people
> >confident
> >that it will be consistent for that special case too.
> 
> Nope, that statement says its ok for a rank to specify that the local
> shared memory segment is 0 bytes. Nothing more. The standard
> unfortunately does not define what pointer value is returned for a rank
> that specifies size = 0. Not sure if the RMA working group intentionally
> left that undefine... Anyway, Open MPI does not appear to be out of
> compliance with the standard here.
> 
> To be safe you should use MPI_Win_shared_query as suggested. You can
> pass MPI_PROC_NULL as the rank to get the pointer for the first non-zero
> sized segment in the shared memory window.
> 
> -Nathan
> HPC-5, LANL
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28506.php


Re: [OMPI users] shared memory zero size segment

2016-02-11 Thread Peter Wind
Thanks Jeff, that was an interesting result. The pointers are here well 
defined, also for the zero size segment. 
However I can't reproduce your output. I still get null pointers (output 
below). 
(I tried both 1.8.5 and 1.10.2 versions) 
What could be the difference? 

Peter 

mpirun -np 4 a.out 
query: me=0, them=0, size=0, disp=1, base=(nil) 
query: me=0, them=1, size=4, disp=1, base=0x2aee280030d0 
query: me=0, them=2, size=4, disp=1, base=0x2aee280030d4 
query: me=0, them=3, size=4, disp=1, base=0x2aee280030d8 
query: me=0, them=PROC_NULL, size=4, disp=1, base=0x2aee280030d0 
query: me=1, them=0, size=0, disp=1, base=(nil) 
query: me=1, them=1, size=4, disp=1, base=0x2aabbb9ce0d0 
query: me=1, them=2, size=4, disp=1, base=0x2aabbb9ce0d4 
query: me=1, them=3, size=4, disp=1, base=0x2aabbb9ce0d8 
query: me=1, them=PROC_NULL, size=4, disp=1, base=0x2aabbb9ce0d0 
query: me=2, them=0, size=0, disp=1, base=(nil) 
query: me=2, them=1, size=4, disp=1, base=0x2b1579dd40d0 
query: me=2, them=2, size=4, disp=1, base=0x2b1579dd40d4 
query: me=2, them=3, size=4, disp=1, base=0x2b1579dd40d8 
query: me=2, them=PROC_NULL, size=4, disp=1, base=0x2b1579dd40d0 
query: me=3, them=0, size=0, disp=1, base=(nil) 
query: me=3, them=1, size=4, disp=1, base=0x2ac8d2c350d0 
query: me=3, them=2, size=4, disp=1, base=0x2ac8d2c350d4 
query: me=3, them=3, size=4, disp=1, base=0x2ac8d2c350d8 
query: me=3, them=PROC_NULL, size=4, disp=1, base=0x2ac8d2c350d0 

- Original Message -

> See attached. Output below. Note that the base you get for ranks 0 and 1 is
> the same, so you need to use the fact that size=0 at rank=0 to know not to
> dereference that pointer and expect to be writing into rank 0's memory,
> since you will write into rank 1's.

> I would probably add "if (size==0) base=NULL;" for good measure.

> Jeff

> $ mpirun -n 4 ./a.out

> query: me=0, them=0, size=0, disp=1, base=0x10bd64000

> query: me=0, them=1, size=4, disp=1, base=0x10bd64000

> query: me=0, them=2, size=4, disp=1, base=0x10bd64004

> query: me=0, them=3, size=4, disp=1, base=0x10bd64008

> query: me=0, them=PROC_NULL, size=4, disp=1, base=0x10bd64000

> query: me=1, them=0, size=0, disp=1, base=0x102d3b000

> query: me=1, them=1, size=4, disp=1, base=0x102d3b000

> query: me=1, them=2, size=4, disp=1, base=0x102d3b004

> query: me=1, them=3, size=4, disp=1, base=0x102d3b008

> query: me=1, them=PROC_NULL, size=4, disp=1, base=0x102d3b000

> query: me=2, them=0, size=0, disp=1, base=0x10aac1000

> query: me=2, them=1, size=4, disp=1, base=0x10aac1000

> query: me=2, them=2, size=4, disp=1, base=0x10aac1004

> query: me=2, them=3, size=4, disp=1, base=0x10aac1008

> query: me=2, them=PROC_NULL, size=4, disp=1, base=0x10aac1000

> query: me=3, them=0, size=0, disp=1, base=0x100fa2000

> query: me=3, them=1, size=4, disp=1, base=0x100fa2000

> query: me=3, them=2, size=4, disp=1, base=0x100fa2004

> query: me=3, them=3, size=4, disp=1, base=0x100fa2008

> query: me=3, them=PROC_NULL, size=4, disp=1, base=0x100fa2000

> On Thu, Feb 11, 2016 at 8:55 AM, Jeff Hammond < jeff.scie...@gmail.com >
> wrote:

> > On Thu, Feb 11, 2016 at 8:46 AM, Nathan Hjelm < hje...@lanl.gov > wrote:
> 
> > >
> 
> > >
> 
> > > On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote:
> 
> > > > I would add that the present situation is bound to give problems for
> > > > some
> 
> > > > users.
> 
> > > > It is natural to divide an array in segments, each process treating its
> 
> > > > own segment, but needing to read adjacent segments too.
> 
> > > > MPI_Win_allocate_shared seems to be designed for this.
> 
> > > > This will work fine as long as no segment as size zero. It can also be
> 
> > > > expected that most testing would be done with all segments larger than
> 
> > > > zero.
> 
> > > > The document adding "size = 0 is valid", would also make people
> > > > confident
> 
> > > > that it will be consistent for that special case too.
> 
> > >
> 
> > > Nope, that statement says its ok for a rank to specify that the local
> 
> > > shared memory segment is 0 bytes. Nothing more. The standard
> 
> > > unfortunately does not define what pointer value is returned for a rank
> 
> > > that specifies size = 0. Not sure if the RMA working group intentionally
> 
> > > left that undefine... Anyway, Open MPI does not appear to be out of
> 
> > > compliance with the standard here.
> 
> > >
> 

> > MPI_Alloc_mem doesn't say what happens if you pass size=0 either. The RMA
> > working group intentionally tries to maintain co

Re: [OMPI users] shared memory zero size segment

2016-02-11 Thread Peter Wind
I would add that the present situation is bound to give problems for some 
users. 

It is natural to divide an array in segments, each process treating its own 
segment, but needing to read adjacent segments too. 
MPI_Win_allocate_shared seems to be designed for this. 
This will work fine as long as no segment as size zero. It can also be expected 
that most testing would be done with all segments larger than zero. 
The document adding "size = 0 is valid", would also make people confident that 
it will be consistent for that special case too. 
Then long down the road of the development of a particular code some special 
case will use a segment of size zero, and it will be hard to trace back this 
error to the mpi library. 

Peter 


- Original Message -



Yes, that is what I meant. 

Enclosed is a C example. 
The point is that the code would logically make sense for task 0, but since it 
asks for a segment of size=0, it only gets a null pointer, which cannot be used 
to access the shared parts. 

Peter 

- Original Message -


I think Peter's point is that if 
- the windows uses contiguous memory 
*and* 
- all tasks knows how much memory was allocated by all other tasks in the 
window 
then it could/should be possible to get rid of MPI_Win_shared_query 

that is likely true if no task allocates zero byte. 
now, if a task allocates zero byte, MPI_Win_allocate_shared could return a null 
pointer and hence makes MPI_Win_shared_query usage mandatory. 

in his example, task 0 allocates zero bytes, so he was expecting the returned 
pointer on task zero points to the memory allocated by task 1. 

if "may enable" should be read as "does enable", then returning a null pointer 
can be seen as a bug. 
if "may enable" can be read as "does not always enable", the returning a null 
pointer is compliant with the standard. 

I am clearly not good at reading/interpreting the standard, so using 
MPI_Win_shared_query is my recommended way to get it work. 
(feel free to call it "bulletproof", "overkill", or even "right") 

Cheers, 

Gilles 

On Thursday, February 11, 2016, Jeff Hammond < jeff.scie...@gmail.com > wrote: 





On Wed, Feb 10, 2016 at 8:44 AM, Peter Wind < peter.w...@met.no > wrote: 



I agree that in practice the best practice would be to use Win_shared_query. 

Still I am confused by this part in the documentation: 
"The allocated memory is contiguous across process ranks unless the info key 
alloc_shared_noncontig is specified. Contiguous across process ranks means that 
the first address in the memory segment of process i is consecutive with the 
last address in the memory segment of process i - 1. This may enable the user 
to calculate remote address offsets with local information only." 

Isn't this an encouragement to use the pointer of Win_allocate_shared directly? 





No, it is not. Win_allocate_shared only gives you the pointer to the portion of 
the allocation that is owned by the calling process. If you want to access the 
whole slab, call Win_shared_query(..,rank=0,..) and use the resulting baseptr. 

I attempted to modify your code to be more correct, but I don't know enough 
Fortran to get it right. If you can parse C examples, I'll provide some of 
those. 

Jeff 




Peter 





I don't know about bulletproof, but Win_shared_query is the *only* valid way to 
get the addresses of memory in other processes associated with a window. 

The default for Win_allocate_shared is contiguous memory, but it can and likely 
will be mapped differently into each process, in which case only relative 
offsets are transferrable. 

Jeff 

On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet < 
gilles.gouaillar...@gmail.com > wrote: 


Peter, 

The bulletproof way is to use MPI_Win_shared_query after 
MPI_Win_allocate_shared. 
I do not know if current behavior is a bug or a feature... 

Cheers, 

Gilles 


On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no > wrote: 


Hi, 

Under fortran, MPI_Win_allocate_shared is called with a window size of zero for 
some processes. 
The output pointer is then not valid for these processes (null pointer). 
Did I understood this wrongly? shouldn't the pointers be contiguous, so that 
for a zero sized window, the pointer should point to the start of the segment 
of the next rank? 
The documentation explicitly specifies "size = 0 is valid". 

Attached a small code, where rank=0 allocate a window of size zero. All the 
other ranks get valid pointers, except rank 0. 

Best regards, 
Peter 
___ 
users mailing list 
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/02/28485.php 




___ 
users mailing list 
us

Re: [OMPI users] shared memory zero size segment

2016-02-11 Thread Peter Wind
Yes, that is what I meant. 

Enclosed is a C example. 
The point is that the code would logically make sense for task 0, but since it 
asks for a segment of size=0, it only gets a null pointer, which cannot be used 
to access the shared parts. 

Peter 

- Original Message -

> I think Peter's point is that if
> - the windows uses contiguous memory
> *and*
> - all tasks knows how much memory was allocated by all other tasks in the
> window
> then it could/should be possible to get rid of MPI_Win_shared_query

> that is likely true if no task allocates zero byte.
> now, if a task allocates zero byte, MPI_Win_allocate_shared could return a
> null pointer and hence makes MPI_Win_shared_query usage mandatory.

> in his example, task 0 allocates zero bytes, so he was expecting the returned
> pointer on task zero points to the memory allocated by task 1.

> if "may enable" should be read as "does enable", then returning a null
> pointer can be seen as a bug.
> if "may enable" can be read as "does not always enable", the returning a null
> pointer is compliant with the standard.

> I am clearly not good at reading/interpreting the standard, so using
> MPI_Win_shared_query is my recommended way to get it work.
> (feel free to call it "bulletproof", "overkill", or even "right")

> Cheers,

> Gilles

> On Thursday, February 11, 2016, Jeff Hammond < jeff.scie...@gmail.com >
> wrote:

> > On Wed, Feb 10, 2016 at 8:44 AM, Peter Wind < peter.w...@met.no > wrote:
> 

> > > I agree that in practice the best practice would be to use
> > > Win_shared_query.
> > 
> 

> > > Still I am confused by this part in the documentation:
> > 
> 
> > > "The allocated memory is contiguous across process ranks unless the info
> > > key
> > > alloc_shared_noncontig is specified. Contiguous across process ranks
> > > means
> > > that the first address in the memory segment of process i is consecutive
> > > with the last address in the memory segment of process i - 1. This may
> > > enable the user to calculate remote address offsets with local
> > > information
> > > only."
> > 
> 

> > > Isn't this an encouragement to use the pointer of Win_allocate_shared
> > > directly?
> > 
> 

> > No, it is not. Win_allocate_shared only gives you the pointer to the
> > portion
> > of the allocation that is owned by the calling process. If you want to
> > access the whole slab, call Win_shared_query(..,rank=0,..) and use the
> > resulting baseptr.
> 

> > I attempted to modify your code to be more correct, but I don't know enough
> > Fortran to get it right. If you can parse C examples, I'll provide some of
> > those.
> 

> > Jeff
> 

> > > Peter
> > 
> 

> > > > I don't know about bulletproof, but Win_shared_query is the *only*
> > > > valid
> > > > way
> > > > to get the addresses of memory in other processes associated with a
> > > > window.
> > > 
> > 
> 

> > > > The default for Win_allocate_shared is contiguous memory, but it can
> > > > and
> > > > likely will be mapped differently into each process, in which case only
> > > > relative offsets are transferrable.
> > > 
> > 
> 

> > > > Jeff
> > > 
> > 
> 

> > > > On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet <
> > > > gilles.gouaillar...@gmail.com > wrote:
> > > 
> > 
> 

> > > > > Peter,
> > > > 
> > > 
> > 
> 

> > > > > The bulletproof way is to use MPI_Win_shared_query after
> > > > > MPI_Win_allocate_shared.
> > > > 
> > > 
> > 
> 
> > > > > I do not know if current behavior is a bug or a feature...
> > > > 
> > > 
> > 
> 

> > > > > Cheers,
> > > > 
> > > 
> > 
> 

> > > > > Gilles
> > > > 
> > > 
> > 
> 

> > > > > On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no >
> > > > > wrote:
> > > > 
> > > 
> > 
> 

> > > > > > Hi,
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > Under fortran, MPI_Win_allocate_shared is called with a window size
> > > > > > of
> > > > > > zero
> > > > > > f

Re: [OMPI users] shared memory zero size segment

2016-02-10 Thread Peter Wind
I agree that in practice the best practice would be to use Win_shared_query. 

Still I am confused by this part in the documentation: 
"The allocated memory is contiguous across process ranks unless the info key 
alloc_shared_noncontig is specified. Contiguous across process ranks means that 
the first address in the memory segment of process i is consecutive with the 
last address in the memory segment of process i - 1. This may enable the user 
to calculate remote address offsets with local information only." 

Isn't this an encouragement to use the pointer of Win_allocate_shared directly? 

Peter 

- Original Message -

> I don't know about bulletproof, but Win_shared_query is the *only* valid way
> to get the addresses of memory in other processes associated with a window.

> The default for Win_allocate_shared is contiguous memory, but it can and
> likely will be mapped differently into each process, in which case only
> relative offsets are transferrable.

> Jeff

> On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com > wrote:

> > Peter,
> 

> > The bulletproof way is to use MPI_Win_shared_query after
> > MPI_Win_allocate_shared.
> 
> > I do not know if current behavior is a bug or a feature...
> 

> > Cheers,
> 

> > Gilles
> 

> > On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no > wrote:
> 

> > > Hi,
> > 
> 

> > > Under fortran, MPI_Win_allocate_shared is called with a window size of
> > > zero
> > > for some processes.
> > 
> 
> > > The output pointer is then not valid for these processes (null pointer).
> > 
> 
> > > Did I understood this wrongly? shouldn't the pointers be contiguous, so
> > > that
> > > for a zero sized window, the pointer should point to the start of the
> > > segment of the next rank?
> > 
> 
> > > The documentation explicitly specifies "size = 0 is valid".
> > 
> 

> > > Attached a small code, where rank=0 allocate a window of size zero. All
> > > the
> > > other ranks get valid pointers, except rank 0.
> > 
> 

> > > Best regards,
> > 
> 
> > > Peter
> > 
> 
> > > ___
> > 
> 
> > > users mailing list
> > 
> 
> > > us...@open-mpi.org
> > 
> 
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/users/2016/02/28485.php
> > 
> 

> > ___
> 
> > users mailing list
> 
> > us...@open-mpi.org
> 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/02/28493.php
> 

> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28496.php

Re: [OMPI users] error openmpi check hdf5

2016-02-10 Thread Peter Wind
Sorry, This was the wrong thread! please disregard the ast answer (1.8.5 and 
1.10.2...) 

Peter 

- Original Message -

> I have tested 1.8.5 and 1.10.2, both fail. (And Intel and Gnu compilers).

> Peter

> - Original Message -

> > which version of Open MPI is this?
> 
> > Thanks
> 
> > Edgar
> 

> > On 2/10/2016 4:13 AM, Delphine Ramalingom wrote:
> 

> > > Hello,
> > 
> 

> > > I try to compile a parallel version of hdf5.
> > 
> 
> > > I have error messages when I check with openmpi.
> > 
> 

> > > Support on HDF5 told me that the errors seem related to the new ompio
> > > implementation inside
> > 
> 
> > > open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing list to
> > > resolve these errors.
> > 
> 

> > > For information, my version of openmpi is : gcc (GCC) 4.8.2
> > 
> 
> > > mpicc --showme
> > 
> 
> > > gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath
> > > -Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags
> > > -L/programs/Compilateurs2/usr/lib -lmpi
> > 
> 

> > > Errors are :
> > 
> 

> > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: ompi_io_ompio_decode_datatype
> > 
> 
> > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: ompi_io_ompio_decode_datatype
> > 
> 
> > > ---
> > 
> 
> > > Primary job terminated normally, but 1 process returned
> > 
> 
> > > a non-zero exit code.. Per user-direction, the job has been aborted.
> > 
> 
> > > ---
> > 
> 
> > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: ompi_io_ompio_set_aggregator_props
> > 
> 
> > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: ompi_io_ompio_set_aggregator_props
> > 
> 
> > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: ompi_io_ompio_set_aggregator_props
> > 
> 

> > > Thanks in advance for your help.
> > 
> 

> > > Regards
> > 
> 
> > > Delphine
> > 
> 

> > > --
> > 
> 
> > > Delphine Ramalingom Barbary | Ingénieure en Calcul Scientifique
> > 
> 
> > > Direction des Usages du Numérique (DUN)
> > 
> 
> > > Centre de Développement du Calcul Scientifique
> > 
> 
> > > TEL : 02 62 93 84 87- FAX : 02 62 93 81 06
> > 
> 

> > --
> 
> > Edgar Gabriel
> 
> > Associate Professor
> 
> > Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of
> > Computer Science  University of Houston
> 
> > Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
> 
> > Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
> 
> > --
> 

> > ___
> 
> > users mailing list
> 
> > us...@open-mpi.org
> 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/02/28489.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28490.php

Re: [OMPI users] error openmpi check hdf5

2016-02-10 Thread Peter Wind
I have tested 1.8.5 and 1.10.2, both fail. (And Intel and Gnu compilers). 

Peter 

- Original Message -

> which version of Open MPI is this?
> Thanks
> Edgar

> On 2/10/2016 4:13 AM, Delphine Ramalingom wrote:

> > Hello,
> 

> > I try to compile a parallel version of hdf5.
> 
> > I have error messages when I check with openmpi.
> 

> > Support on HDF5 told me that the errors seem related to the new ompio
> > implementation inside
> 
> > open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing list to
> > resolve these errors.
> 

> > For information, my version of openmpi is : gcc (GCC) 4.8.2
> 
> > mpicc --showme
> 
> > gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath
> > -Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags
> > -L/programs/Compilateurs2/usr/lib -lmpi
> 

> > Errors are :
> 

> > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > symbol: ompi_io_ompio_decode_datatype
> 
> > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > symbol: ompi_io_ompio_decode_datatype
> 
> > ---
> 
> > Primary job terminated normally, but 1 process returned
> 
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> 
> > ---
> 
> > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > symbol: ompi_io_ompio_set_aggregator_props
> 
> > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > symbol: ompi_io_ompio_set_aggregator_props
> 
> > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
> > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined
> > symbol: ompi_io_ompio_set_aggregator_props
> 

> > Thanks in advance for your help.
> 

> > Regards
> 
> > Delphine
> 

> > --
> 
> > Delphine Ramalingom Barbary | Ingénieure en Calcul Scientifique
> 
> > Direction des Usages du Numérique (DUN)
> 
> > Centre de Développement du Calcul Scientifique
> 
> > TEL : 02 62 93 84 87- FAX : 02 62 93 81 06
> 

> --
> Edgar Gabriel
> Associate Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of
> Computer Science  University of Houston
> Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
> Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
> --

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28489.php

Re: [OMPI users] shared memory zero size segment

2016-02-10 Thread Peter Wind
Sorry for that, here is the attachement!

Peter

- Original Message -
> Peter --
> 
> Somewhere along the way, your attachment got lost.  Could you re-send?
> 
> Thanks.
> 
> 
> > On Feb 10, 2016, at 5:56 AM, Peter Wind  wrote:
> > 
> > Hi,
> > 
> > Under fortran, MPI_Win_allocate_shared is called with a window size of zero
> > for some processes.
> > The output pointer is then not valid for these processes (null pointer).
> > Did I understood this wrongly? shouldn't the pointers be contiguous, so
> > that for a zero sized window, the pointer should point to the start of the
> > segment of the next rank?
> > The documentation explicitly specifies "size = 0 is valid".
> > 
> > Attached a small code, where rank=0 allocate a window of size zero. All the
> > other ranks get valid pointers, except rank 0.
> > 
> > Best regards,
> > Peter
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/02/28485.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28486.php
> 
program sharetest

! test zero size segment.
! run on at least 3 cpus
! mpirun -np 4 a.out

   use mpi

   use, intrinsic :: iso_c_binding,

   implicit none


   integer, parameter :: nsize = 20
   integer, pointer   :: array(:)
   integer:: num_procs
   integer:: ierr
   integer:: irank, irank_group
   integer:: win
   integer:: disp_unit
   type(c_ptr):: cp1
   type(c_ptr):: cp2

   integer(MPI_ADDRESS_KIND) :: win_size
   integer(MPI_ADDRESS_KIND) :: segment_size

   call MPI_Init(ierr)
   call MPI_Comm_size(MPI_COMM_WORLD, num_procs, ierr)
   call MPI_Comm_rank(MPI_COMM_WORLD, irank, ierr)

   disp_unit = sizeof(1)

   win_size = irank*disp_unit

   call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, cp1, win, ierr)

!   write(*,*)'rank ', irank,', pointer ',cp1

  call c_f_pointer(cp1, array, [nsize])

77 format(4(A,I3))

   if(irank/=0)then
  array(1)=irank
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
  if(irank/=num_procs-1)then
 print 77, ' rank', irank, ':  array(1)', array(1),' shared with next rank: ',array(irank+1)
  else
 print 77, ' rank', irank, ':  array(1)', array(1),' shared with previous rank: ',array(0)
  endif
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
   else
 CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
 if(.not.associated(array))then
print 77, 'zero pointer found, rank', irank
 else
print 77, ' rank', irank, ' array associated '
print 77, ' rank', irank, ':  array(1) ', array(1),' shared with next rank: ',array(irank+1)
 endif
   endif


   call MPI_Finalize(ierr)

 end program sharetest


Re: [OMPI users] shared memory zero size segment

2016-02-10 Thread Peter Wind


- Original Message -
> Peter --
> 
> Somewhere along the way, your attachment got lost.  Could you re-send?
> 
> Thanks.
> 
> 
> > On Feb 10, 2016, at 5:56 AM, Peter Wind  wrote:
> > 
> > Hi,
> > 
> > Under fortran, MPI_Win_allocate_shared is called with a window size of zero
> > for some processes.
> > The output pointer is then not valid for these processes (null pointer).
> > Did I understood this wrongly? shouldn't the pointers be contiguous, so
> > that for a zero sized window, the pointer should point to the start of the
> > segment of the next rank?
> > The documentation explicitly specifies "size = 0 is valid".
> > 
> > Attached a small code, where rank=0 allocate a window of size zero. All the
> > other ranks get valid pointers, except rank 0.
> > 
> > Best regards,
> > Peter
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/02/28485.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28486.php
> 
program sharetest

! test zero size segment.
! run on at least 3 cpus
! mpirun -np 4 a.out

   use mpi

   use, intrinsic :: iso_c_binding,

   implicit none


   integer, parameter :: nsize = 20
   integer, pointer   :: array(:)
   integer:: num_procs
   integer:: ierr
   integer:: irank, irank_group
   integer:: win
   integer:: disp_unit
   type(c_ptr):: cp1
   type(c_ptr):: cp2

   integer(MPI_ADDRESS_KIND) :: win_size
   integer(MPI_ADDRESS_KIND) :: segment_size

   call MPI_Init(ierr)
   call MPI_Comm_size(MPI_COMM_WORLD, num_procs, ierr)
   call MPI_Comm_rank(MPI_COMM_WORLD, irank, ierr)

   disp_unit = sizeof(1)

   win_size = irank*disp_unit

   call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, cp1, win, ierr)

!   write(*,*)'rank ', irank,', pointer ',cp1

  call c_f_pointer(cp1, array, [nsize])

77 format(4(A,I3))

   if(irank/=0)then
  array(1)=irank
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
  if(irank/=num_procs-1)then
 print 77, ' rank', irank, ':  array(1)', array(1),' shared with next rank: ',array(irank+1)
  else
 print 77, ' rank', irank, ':  array(1)', array(1),' shared with previous rank: ',array(0)
  endif
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
   else
 CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
  CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
 if(.not.associated(array))then
print 77, 'zero pointer found, rank', irank
 else
print 77, ' rank', irank, ' array associated '
print 77, ' rank', irank, ':  array(1) ', array(1),' shared with next rank: ',array(irank+1)
 endif
   endif


   call MPI_Finalize(ierr)

 end program sharetest


[OMPI users] shared memory zero size segment

2016-02-10 Thread Peter Wind
Hi,

Under fortran, MPI_Win_allocate_shared is called with a window size of zero for 
some processes.
The output pointer is then not valid for these processes (null pointer).
Did I understood this wrongly? shouldn't the pointers be contiguous, so that 
for a zero sized window, the pointer should point to the start of the segment 
of the next rank?
The documentation explicitly specifies "size = 0 is valid".

Attached a small code, where rank=0 allocate a window of size zero. All the 
other ranks get valid pointers, except rank 0.

Best regards,
Peter


Re: [OMPI users] shared memory under fortran, bug?

2016-02-02 Thread Peter Wind
That worked! 

i.e with the changed you proposed the code gives the right result. 

That was efficient work, thank you Gilles :) 

Best wishes, 
Peter 

- Original Message -

> Thanks Peter,

> that is quite unexpected ...

> let s try an other workaround, can you replace

> integer:: comm_group
> with

> integer:: comm_group, comm_tmp

> and
> call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)

> with

> call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_tmp, ierr)
> if (irank < (num_procs/2)) then
> comm_group = comm_tmp
> else
> call MPI_Comm_dup(comm_tmp, comm_group, ierr)
> endif

> if it works, I will make a fix tomorrow when I can access my workstation.
> if not, can you please run
> mpirun --mca osc_base_verbose 100 ...
> and post the output ?

> I will then try to reproduce the issue and investigate it

> Cheers,

> Gilles

> On Tuesday, February 2, 2016, Peter Wind < peter.w...@met.no > wrote:

> > Thanks Gilles,
> 

> > I get the following output (I guess it is not what you wanted?).
> 

> > Peter
> 

> > $ mpirun --mca osc pt2pt -np 4 a.out
> 
> > --
> 
> > A requested component was not found, or was unable to be opened. This
> 
> > means that this component is either not installed or is unable to be
> 
> > used on your system (e.g., sometimes this means that shared libraries
> 
> > that the component requires are unable to be found/loaded). Note that
> 
> > Open MPI stopped checking at the first component that it did not find.
> 

> > Host: stallo-2.local
> 
> > Framework: osc
> 
> > Component: pt2pt
> 
> > --
> 
> > --
> 
> > It looks like MPI_INIT failed for some reason; your parallel process is
> 
> > likely to abort. There are many reasons that a parallel process can
> 
> > fail during MPI_INIT; some of which are due to configuration or environment
> 
> > problems. This failure appears to be an internal failure; here's some
> 
> > additional information (which may only be relevant to an Open MPI
> 
> > developer):
> 

> > ompi_osc_base_open() failed
> 
> > --> Returned "Not found" (-13) instead of "Success" (0)
> 
> > --
> 
> > *** An error occurred in MPI_Init
> 
> > *** on a NULL communicator
> 
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> 
> > *** and potentially your MPI job)
> 
> > [stallo-2.local:38415] Local abort before MPI_INIT completed successfully;
> > not able to aggregate error messages, and not able to guarantee that all
> > other processes were killed!
> 
> > *** An error occurred in MPI_Init
> 
> > *** on a NULL communicator
> 
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> 
> > *** and potentially your MPI job)
> 
> > [stallo-2.local:38418] Local abort before MPI_INIT completed successfully;
> > not able to aggregate error messages, and not able to guarantee that all
> > other processes were killed!
> 
> > *** An error occurred in MPI_Init
> 
> > *** on a NULL communicator
> 
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> 
> > *** and potentially your MPI job)
> 
> > [stallo-2.local:38416] Local abort before MPI_INIT completed successfully;
> > not able to aggregate error messages, and not able to guarantee that all
> > other processes were killed!
> 
> > *** An error occurred in MPI_Init
> 
> > *** on a NULL communicator
> 
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> 
> > *** and potentially your MPI job)
> 
> > [stallo-2.local:38417] Local abort before MPI_INIT completed successfully;
> > not able to aggregate error messages, and not able to guarantee that all
> > other processes were killed!
> 
> > ---
> 
> > Primary job terminated normally, but 1 process returned
> 
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> 
> > ---
> 
> > --
> 
> > mpirun detected that one or more processes exited with no

Re: [OMPI users] shared memory under fortran, bug?

2016-02-02 Thread Peter Wind
Thanks Gilles, 

I get the following output (I guess it is not what you wanted?). 

Peter 

$ mpirun --mca osc pt2pt -np 4 a.out 
-- 
A requested component was not found, or was unable to be opened. This 
means that this component is either not installed or is unable to be 
used on your system (e.g., sometimes this means that shared libraries 
that the component requires are unable to be found/loaded). Note that 
Open MPI stopped checking at the first component that it did not find. 

Host: stallo-2.local 
Framework: osc 
Component: pt2pt 
-- 
-- 
It looks like MPI_INIT failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during MPI_INIT; some of which are due to configuration or environment 
problems. This failure appears to be an internal failure; here's some 
additional information (which may only be relevant to an Open MPI 
developer): 

ompi_osc_base_open() failed 
--> Returned "Not found" (-13) instead of "Success" (0) 
-- 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
*** and potentially your MPI job) 
[stallo-2.local:38415] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed! 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
*** and potentially your MPI job) 
[stallo-2.local:38418] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed! 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
*** and potentially your MPI job) 
[stallo-2.local:38416] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed! 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
*** and potentially your MPI job) 
[stallo-2.local:38417] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed! 
--- 
Primary job terminated normally, but 1 process returned 
a non-zero exit code.. Per user-direction, the job has been aborted. 
--- 
-- 
mpirun detected that one or more processes exited with non-zero status, thus 
causing 
the job to be terminated. The first process to do so was: 

Process name: [[52507,1],0] 
Exit code: 1 
-- 
[stallo-2.local:38410] 3 more processes have sent help message 
help-mca-base.txt / find-available:not-valid 
[stallo-2.local:38410] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
all help / error messages 
[stallo-2.local:38410] 2 more processes have sent help message help-mpi-runtime 
/ mpi_init:startup:internal-failure 

- Original Message -

> Peter,

> at first glance, your test program looks correct.

> can you please try to run
> mpirun --mca osc pt2pt -np 4 ...

> I might have identified a bug with the sm osc component.

> Cheers,

> Gilles

> On Tuesday, February 2, 2016, Peter Wind < peter.w...@met.no > wrote:

> > Enclosed is a short (< 100 lines) fortran code example that uses shared
> > memory.
> 
> > It seems to me it behaves wrongly if openmpi is used.
> 
> > Compiled with SGI/mpt , it gives the right result.
> 

> > To fail, the code must be run on a single node.
> 
> > It creates two groups of 2 processes each. Within each group memory is
> > shared.
> 
> > The error is that the two groups get the same memory allocated, but they
> > should not.
> 

> > Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel
> > 14.0
> 
> > all fail.
> 

> > The call:
> 
> > call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL,
> > comm_group,
> > cp1, win, ierr)
> 

> > Should allocate memory only within the group. But when the other group
> > allocates memory, the pointers from the two groups point to the same
> > a

[OMPI users] shared memory under fortran, bug?

2016-02-02 Thread Peter Wind
Enclosed is a short (< 100 lines) fortran code example that uses shared memory.
It seems to me it behaves wrongly if openmpi is used. 
Compiled with SGI/mpt , it gives the right result.

To fail, the code must be run on a single node.
It creates two groups of 2 processes each. Within each group memory is shared.
The error is that the two groups get the same memory allocated, but they should 
not.

Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel 14.0
all fail.

The call:
   call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, comm_group, 
cp1, win, ierr)

Should allocate memory only within the group. But when the other group 
allocates memory, the pointers from the two groups point to the same address in 
memory.

Could you please confirm that this is the wrong behaviour? 

Best regards,
Peter Windprogram shmem_mpi

   !
   ! in this example two groups are created, within each group memory is shared.
   ! Still the other group get allocated the same adress space, which it shouldn't.
   !
   ! Run with 4 processes, mpirun -np 4 a.out


   use mpi

   use, intrinsic :: iso_c_binding, only : c_ptr, c_f_pointer

   implicit none
!   include 'mpif.h'

   integer, parameter :: nsize = 100
   integer, pointer   :: array(:)
   integer:: num_procs
   integer:: ierr
   integer:: irank, irank_group
   integer:: win
   integer:: comm = MPI_COMM_WORLD
   integer:: disp_unit
   type(c_ptr):: cp1
   type(c_ptr):: cp2
   integer:: comm_group

   integer(MPI_ADDRESS_KIND) :: win_size
   integer(MPI_ADDRESS_KIND) :: segment_size

   call MPI_Init(ierr)
   call MPI_Comm_size(comm, num_procs, ierr)
   call MPI_Comm_rank(comm, irank, ierr)

   disp_unit = sizeof(1)
   call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)
   call MPI_Comm_rank(comm_group, irank_group, ierr)
!   print *, 'irank=', irank, ' group rank=', irank_group

   if (irank_group == 0) then
  win_size = nsize*disp_unit
   else
  win_size = 0
   endif

   call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, comm_group, cp1, win, ierr)
   call MPI_Win_fence(0, win, ierr)

   call MPI_Win_shared_query(win, 0, segment_size, disp_unit, cp2, ierr)

   call MPI_Win_fence(0, win, ierr)
   CALL MPI_BARRIER(comm, ierr)! allocations finished
!   print *, 'irank=', irank, ' size ', segment_size

   call c_f_pointer(cp2, array, [nsize])

   array(1)=0;array(2)=0
   CALL MPI_BARRIER(comm, ierr)!
77 format(4(A,I3))
   if(irank