Re: [OMPI users] shared memory zero size segment
You can be right semantically. But also the sentence "the first address in the memory segment of process i is consecutive with the last address in the memory segment of process i - 1" is not easy to interpret correctly for a zero size segment. There may be good reasons not to allocate the pointer for zero size segment. What I try to say is, that a new user reading the documentation, will not expect this behaviour before trying it out. Couldn't a small sentence in the documentation, like "the pointer should not be used for zero size segments" clarify this? Peter - Original Message - > > On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote: > >I would add that the present situation is bound to give problems for > >some > >users. > >It is natural to divide an array in segments, each process treating its > >own segment, but needing to read adjacent segments too. > >MPI_Win_allocate_shared seems to be designed for this. > >This will work fine as long as no segment as size zero. It can also be > >expected that most testing would be done with all segments larger than > >zero. > >The document adding "size = 0 is valid", would also make people > >confident > >that it will be consistent for that special case too. > > Nope, that statement says its ok for a rank to specify that the local > shared memory segment is 0 bytes. Nothing more. The standard > unfortunately does not define what pointer value is returned for a rank > that specifies size = 0. Not sure if the RMA working group intentionally > left that undefine... Anyway, Open MPI does not appear to be out of > compliance with the standard here. > > To be safe you should use MPI_Win_shared_query as suggested. You can > pass MPI_PROC_NULL as the rank to get the pointer for the first non-zero > sized segment in the shared memory window. > > -Nathan > HPC-5, LANL > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28506.php
Re: [OMPI users] shared memory zero size segment
Thanks Jeff, that was an interesting result. The pointers are here well defined, also for the zero size segment. However I can't reproduce your output. I still get null pointers (output below). (I tried both 1.8.5 and 1.10.2 versions) What could be the difference? Peter mpirun -np 4 a.out query: me=0, them=0, size=0, disp=1, base=(nil) query: me=0, them=1, size=4, disp=1, base=0x2aee280030d0 query: me=0, them=2, size=4, disp=1, base=0x2aee280030d4 query: me=0, them=3, size=4, disp=1, base=0x2aee280030d8 query: me=0, them=PROC_NULL, size=4, disp=1, base=0x2aee280030d0 query: me=1, them=0, size=0, disp=1, base=(nil) query: me=1, them=1, size=4, disp=1, base=0x2aabbb9ce0d0 query: me=1, them=2, size=4, disp=1, base=0x2aabbb9ce0d4 query: me=1, them=3, size=4, disp=1, base=0x2aabbb9ce0d8 query: me=1, them=PROC_NULL, size=4, disp=1, base=0x2aabbb9ce0d0 query: me=2, them=0, size=0, disp=1, base=(nil) query: me=2, them=1, size=4, disp=1, base=0x2b1579dd40d0 query: me=2, them=2, size=4, disp=1, base=0x2b1579dd40d4 query: me=2, them=3, size=4, disp=1, base=0x2b1579dd40d8 query: me=2, them=PROC_NULL, size=4, disp=1, base=0x2b1579dd40d0 query: me=3, them=0, size=0, disp=1, base=(nil) query: me=3, them=1, size=4, disp=1, base=0x2ac8d2c350d0 query: me=3, them=2, size=4, disp=1, base=0x2ac8d2c350d4 query: me=3, them=3, size=4, disp=1, base=0x2ac8d2c350d8 query: me=3, them=PROC_NULL, size=4, disp=1, base=0x2ac8d2c350d0 - Original Message - > See attached. Output below. Note that the base you get for ranks 0 and 1 is > the same, so you need to use the fact that size=0 at rank=0 to know not to > dereference that pointer and expect to be writing into rank 0's memory, > since you will write into rank 1's. > I would probably add "if (size==0) base=NULL;" for good measure. > Jeff > $ mpirun -n 4 ./a.out > query: me=0, them=0, size=0, disp=1, base=0x10bd64000 > query: me=0, them=1, size=4, disp=1, base=0x10bd64000 > query: me=0, them=2, size=4, disp=1, base=0x10bd64004 > query: me=0, them=3, size=4, disp=1, base=0x10bd64008 > query: me=0, them=PROC_NULL, size=4, disp=1, base=0x10bd64000 > query: me=1, them=0, size=0, disp=1, base=0x102d3b000 > query: me=1, them=1, size=4, disp=1, base=0x102d3b000 > query: me=1, them=2, size=4, disp=1, base=0x102d3b004 > query: me=1, them=3, size=4, disp=1, base=0x102d3b008 > query: me=1, them=PROC_NULL, size=4, disp=1, base=0x102d3b000 > query: me=2, them=0, size=0, disp=1, base=0x10aac1000 > query: me=2, them=1, size=4, disp=1, base=0x10aac1000 > query: me=2, them=2, size=4, disp=1, base=0x10aac1004 > query: me=2, them=3, size=4, disp=1, base=0x10aac1008 > query: me=2, them=PROC_NULL, size=4, disp=1, base=0x10aac1000 > query: me=3, them=0, size=0, disp=1, base=0x100fa2000 > query: me=3, them=1, size=4, disp=1, base=0x100fa2000 > query: me=3, them=2, size=4, disp=1, base=0x100fa2004 > query: me=3, them=3, size=4, disp=1, base=0x100fa2008 > query: me=3, them=PROC_NULL, size=4, disp=1, base=0x100fa2000 > On Thu, Feb 11, 2016 at 8:55 AM, Jeff Hammond < jeff.scie...@gmail.com > > wrote: > > On Thu, Feb 11, 2016 at 8:46 AM, Nathan Hjelm < hje...@lanl.gov > wrote: > > > > > > > > > > > > On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote: > > > > > I would add that the present situation is bound to give problems for > > > > some > > > > > users. > > > > > It is natural to divide an array in segments, each process treating its > > > > > own segment, but needing to read adjacent segments too. > > > > > MPI_Win_allocate_shared seems to be designed for this. > > > > > This will work fine as long as no segment as size zero. It can also be > > > > > expected that most testing would be done with all segments larger than > > > > > zero. > > > > > The document adding "size = 0 is valid", would also make people > > > > confident > > > > > that it will be consistent for that special case too. > > > > > > > > Nope, that statement says its ok for a rank to specify that the local > > > > shared memory segment is 0 bytes. Nothing more. The standard > > > > unfortunately does not define what pointer value is returned for a rank > > > > that specifies size = 0. Not sure if the RMA working group intentionally > > > > left that undefine... Anyway, Open MPI does not appear to be out of > > > > compliance with the standard here. > > > > > > > MPI_Alloc_mem doesn't say what happens if you pass size=0 either. The RMA > > working group intentionally tries to maintain co
Re: [OMPI users] shared memory zero size segment
I would add that the present situation is bound to give problems for some users. It is natural to divide an array in segments, each process treating its own segment, but needing to read adjacent segments too. MPI_Win_allocate_shared seems to be designed for this. This will work fine as long as no segment as size zero. It can also be expected that most testing would be done with all segments larger than zero. The document adding "size = 0 is valid", would also make people confident that it will be consistent for that special case too. Then long down the road of the development of a particular code some special case will use a segment of size zero, and it will be hard to trace back this error to the mpi library. Peter - Original Message - Yes, that is what I meant. Enclosed is a C example. The point is that the code would logically make sense for task 0, but since it asks for a segment of size=0, it only gets a null pointer, which cannot be used to access the shared parts. Peter - Original Message - I think Peter's point is that if - the windows uses contiguous memory *and* - all tasks knows how much memory was allocated by all other tasks in the window then it could/should be possible to get rid of MPI_Win_shared_query that is likely true if no task allocates zero byte. now, if a task allocates zero byte, MPI_Win_allocate_shared could return a null pointer and hence makes MPI_Win_shared_query usage mandatory. in his example, task 0 allocates zero bytes, so he was expecting the returned pointer on task zero points to the memory allocated by task 1. if "may enable" should be read as "does enable", then returning a null pointer can be seen as a bug. if "may enable" can be read as "does not always enable", the returning a null pointer is compliant with the standard. I am clearly not good at reading/interpreting the standard, so using MPI_Win_shared_query is my recommended way to get it work. (feel free to call it "bulletproof", "overkill", or even "right") Cheers, Gilles On Thursday, February 11, 2016, Jeff Hammond < jeff.scie...@gmail.com > wrote: On Wed, Feb 10, 2016 at 8:44 AM, Peter Wind < peter.w...@met.no > wrote: I agree that in practice the best practice would be to use Win_shared_query. Still I am confused by this part in the documentation: "The allocated memory is contiguous across process ranks unless the info key alloc_shared_noncontig is specified. Contiguous across process ranks means that the first address in the memory segment of process i is consecutive with the last address in the memory segment of process i - 1. This may enable the user to calculate remote address offsets with local information only." Isn't this an encouragement to use the pointer of Win_allocate_shared directly? No, it is not. Win_allocate_shared only gives you the pointer to the portion of the allocation that is owned by the calling process. If you want to access the whole slab, call Win_shared_query(..,rank=0,..) and use the resulting baseptr. I attempted to modify your code to be more correct, but I don't know enough Fortran to get it right. If you can parse C examples, I'll provide some of those. Jeff Peter I don't know about bulletproof, but Win_shared_query is the *only* valid way to get the addresses of memory in other processes associated with a window. The default for Win_allocate_shared is contiguous memory, but it can and likely will be mapped differently into each process, in which case only relative offsets are transferrable. Jeff On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com > wrote: Peter, The bulletproof way is to use MPI_Win_shared_query after MPI_Win_allocate_shared. I do not know if current behavior is a bug or a feature... Cheers, Gilles On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no > wrote: Hi, Under fortran, MPI_Win_allocate_shared is called with a window size of zero for some processes. The output pointer is then not valid for these processes (null pointer). Did I understood this wrongly? shouldn't the pointers be contiguous, so that for a zero sized window, the pointer should point to the start of the segment of the next rank? The documentation explicitly specifies "size = 0 is valid". Attached a small code, where rank=0 allocate a window of size zero. All the other ranks get valid pointers, except rank 0. Best regards, Peter ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/02/28485.php ___ users mailing list us
Re: [OMPI users] shared memory zero size segment
Yes, that is what I meant. Enclosed is a C example. The point is that the code would logically make sense for task 0, but since it asks for a segment of size=0, it only gets a null pointer, which cannot be used to access the shared parts. Peter - Original Message - > I think Peter's point is that if > - the windows uses contiguous memory > *and* > - all tasks knows how much memory was allocated by all other tasks in the > window > then it could/should be possible to get rid of MPI_Win_shared_query > that is likely true if no task allocates zero byte. > now, if a task allocates zero byte, MPI_Win_allocate_shared could return a > null pointer and hence makes MPI_Win_shared_query usage mandatory. > in his example, task 0 allocates zero bytes, so he was expecting the returned > pointer on task zero points to the memory allocated by task 1. > if "may enable" should be read as "does enable", then returning a null > pointer can be seen as a bug. > if "may enable" can be read as "does not always enable", the returning a null > pointer is compliant with the standard. > I am clearly not good at reading/interpreting the standard, so using > MPI_Win_shared_query is my recommended way to get it work. > (feel free to call it "bulletproof", "overkill", or even "right") > Cheers, > Gilles > On Thursday, February 11, 2016, Jeff Hammond < jeff.scie...@gmail.com > > wrote: > > On Wed, Feb 10, 2016 at 8:44 AM, Peter Wind < peter.w...@met.no > wrote: > > > > I agree that in practice the best practice would be to use > > > Win_shared_query. > > > > > > Still I am confused by this part in the documentation: > > > > > > "The allocated memory is contiguous across process ranks unless the info > > > key > > > alloc_shared_noncontig is specified. Contiguous across process ranks > > > means > > > that the first address in the memory segment of process i is consecutive > > > with the last address in the memory segment of process i - 1. This may > > > enable the user to calculate remote address offsets with local > > > information > > > only." > > > > > > Isn't this an encouragement to use the pointer of Win_allocate_shared > > > directly? > > > > > No, it is not. Win_allocate_shared only gives you the pointer to the > > portion > > of the allocation that is owned by the calling process. If you want to > > access the whole slab, call Win_shared_query(..,rank=0,..) and use the > > resulting baseptr. > > > I attempted to modify your code to be more correct, but I don't know enough > > Fortran to get it right. If you can parse C examples, I'll provide some of > > those. > > > Jeff > > > > Peter > > > > > > > I don't know about bulletproof, but Win_shared_query is the *only* > > > > valid > > > > way > > > > to get the addresses of memory in other processes associated with a > > > > window. > > > > > > > > > > The default for Win_allocate_shared is contiguous memory, but it can > > > > and > > > > likely will be mapped differently into each process, in which case only > > > > relative offsets are transferrable. > > > > > > > > > > Jeff > > > > > > > > > > On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet < > > > > gilles.gouaillar...@gmail.com > wrote: > > > > > > > > > > > Peter, > > > > > > > > > > > > > > > The bulletproof way is to use MPI_Win_shared_query after > > > > > MPI_Win_allocate_shared. > > > > > > > > > > > > > > > I do not know if current behavior is a bug or a feature... > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Gilles > > > > > > > > > > > > > > > On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no > > > > > > wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > Under fortran, MPI_Win_allocate_shared is called with a window size > > > > > > of > > > > > > zero > > > > > > f
Re: [OMPI users] shared memory zero size segment
I agree that in practice the best practice would be to use Win_shared_query. Still I am confused by this part in the documentation: "The allocated memory is contiguous across process ranks unless the info key alloc_shared_noncontig is specified. Contiguous across process ranks means that the first address in the memory segment of process i is consecutive with the last address in the memory segment of process i - 1. This may enable the user to calculate remote address offsets with local information only." Isn't this an encouragement to use the pointer of Win_allocate_shared directly? Peter - Original Message - > I don't know about bulletproof, but Win_shared_query is the *only* valid way > to get the addresses of memory in other processes associated with a window. > The default for Win_allocate_shared is contiguous memory, but it can and > likely will be mapped differently into each process, in which case only > relative offsets are transferrable. > Jeff > On Wed, Feb 10, 2016 at 4:19 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > wrote: > > Peter, > > > The bulletproof way is to use MPI_Win_shared_query after > > MPI_Win_allocate_shared. > > > I do not know if current behavior is a bug or a feature... > > > Cheers, > > > Gilles > > > On Wednesday, February 10, 2016, Peter Wind < peter.w...@met.no > wrote: > > > > Hi, > > > > > > Under fortran, MPI_Win_allocate_shared is called with a window size of > > > zero > > > for some processes. > > > > > > The output pointer is then not valid for these processes (null pointer). > > > > > > Did I understood this wrongly? shouldn't the pointers be contiguous, so > > > that > > > for a zero sized window, the pointer should point to the start of the > > > segment of the next rank? > > > > > > The documentation explicitly specifies "size = 0 is valid". > > > > > > Attached a small code, where rank=0 allocate a window of size zero. All > > > the > > > other ranks get valid pointers, except rank 0. > > > > > > Best regards, > > > > > > Peter > > > > > > ___ > > > > > > users mailing list > > > > > > us...@open-mpi.org > > > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > Link to this post: > > > http://www.open-mpi.org/community/lists/users/2016/02/28485.php > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/02/28493.php > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/ > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28496.php
Re: [OMPI users] error openmpi check hdf5
Sorry, This was the wrong thread! please disregard the ast answer (1.8.5 and 1.10.2...) Peter - Original Message - > I have tested 1.8.5 and 1.10.2, both fail. (And Intel and Gnu compilers). > Peter > - Original Message - > > which version of Open MPI is this? > > > Thanks > > > Edgar > > > On 2/10/2016 4:13 AM, Delphine Ramalingom wrote: > > > > Hello, > > > > > > I try to compile a parallel version of hdf5. > > > > > > I have error messages when I check with openmpi. > > > > > > Support on HDF5 told me that the errors seem related to the new ompio > > > implementation inside > > > > > > open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing list to > > > resolve these errors. > > > > > > For information, my version of openmpi is : gcc (GCC) 4.8.2 > > > > > > mpicc --showme > > > > > > gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath > > > -Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags > > > -L/programs/Compilateurs2/usr/lib -lmpi > > > > > > Errors are : > > > > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > > symbol: ompi_io_ompio_decode_datatype > > > > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > > symbol: ompi_io_ompio_decode_datatype > > > > > > --- > > > > > > Primary job terminated normally, but 1 process returned > > > > > > a non-zero exit code.. Per user-direction, the job has been aborted. > > > > > > --- > > > > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > > symbol: ompi_io_ompio_set_aggregator_props > > > > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > > symbol: ompi_io_ompio_set_aggregator_props > > > > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > > symbol: ompi_io_ompio_set_aggregator_props > > > > > > Thanks in advance for your help. > > > > > > Regards > > > > > > Delphine > > > > > > -- > > > > > > Delphine Ramalingom Barbary | Ingénieure en Calcul Scientifique > > > > > > Direction des Usages du Numérique (DUN) > > > > > > Centre de Développement du Calcul Scientifique > > > > > > TEL : 02 62 93 84 87- FAX : 02 62 93 81 06 > > > > > -- > > > Edgar Gabriel > > > Associate Professor > > > Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of > > Computer Science University of Houston > > > Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA > > > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > > > -- > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/02/28489.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28490.php
Re: [OMPI users] error openmpi check hdf5
I have tested 1.8.5 and 1.10.2, both fail. (And Intel and Gnu compilers). Peter - Original Message - > which version of Open MPI is this? > Thanks > Edgar > On 2/10/2016 4:13 AM, Delphine Ramalingom wrote: > > Hello, > > > I try to compile a parallel version of hdf5. > > > I have error messages when I check with openmpi. > > > Support on HDF5 told me that the errors seem related to the new ompio > > implementation inside > > > open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing list to > > resolve these errors. > > > For information, my version of openmpi is : gcc (GCC) 4.8.2 > > > mpicc --showme > > > gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath > > -Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags > > -L/programs/Compilateurs2/usr/lib -lmpi > > > Errors are : > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: ompi_io_ompio_decode_datatype > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: ompi_io_ompio_decode_datatype > > > --- > > > Primary job terminated normally, but 1 process returned > > > a non-zero exit code.. Per user-direction, the job has been aborted. > > > --- > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: ompi_io_ompio_set_aggregator_props > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: ompi_io_ompio_set_aggregator_props > > > .../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: > > /programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: ompi_io_ompio_set_aggregator_props > > > Thanks in advance for your help. > > > Regards > > > Delphine > > > -- > > > Delphine Ramalingom Barbary | Ingénieure en Calcul Scientifique > > > Direction des Usages du Numérique (DUN) > > > Centre de Développement du Calcul Scientifique > > > TEL : 02 62 93 84 87- FAX : 02 62 93 81 06 > > -- > Edgar Gabriel > Associate Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of > Computer Science University of Houston > Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > -- > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28489.php
Re: [OMPI users] shared memory zero size segment
Sorry for that, here is the attachement! Peter - Original Message - > Peter -- > > Somewhere along the way, your attachment got lost. Could you re-send? > > Thanks. > > > > On Feb 10, 2016, at 5:56 AM, Peter Wind wrote: > > > > Hi, > > > > Under fortran, MPI_Win_allocate_shared is called with a window size of zero > > for some processes. > > The output pointer is then not valid for these processes (null pointer). > > Did I understood this wrongly? shouldn't the pointers be contiguous, so > > that for a zero sized window, the pointer should point to the start of the > > segment of the next rank? > > The documentation explicitly specifies "size = 0 is valid". > > > > Attached a small code, where rank=0 allocate a window of size zero. All the > > other ranks get valid pointers, except rank 0. > > > > Best regards, > > Peter > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/02/28485.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28486.php > program sharetest ! test zero size segment. ! run on at least 3 cpus ! mpirun -np 4 a.out use mpi use, intrinsic :: iso_c_binding, implicit none integer, parameter :: nsize = 20 integer, pointer :: array(:) integer:: num_procs integer:: ierr integer:: irank, irank_group integer:: win integer:: disp_unit type(c_ptr):: cp1 type(c_ptr):: cp2 integer(MPI_ADDRESS_KIND) :: win_size integer(MPI_ADDRESS_KIND) :: segment_size call MPI_Init(ierr) call MPI_Comm_size(MPI_COMM_WORLD, num_procs, ierr) call MPI_Comm_rank(MPI_COMM_WORLD, irank, ierr) disp_unit = sizeof(1) win_size = irank*disp_unit call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, cp1, win, ierr) ! write(*,*)'rank ', irank,', pointer ',cp1 call c_f_pointer(cp1, array, [nsize]) 77 format(4(A,I3)) if(irank/=0)then array(1)=irank CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) if(irank/=num_procs-1)then print 77, ' rank', irank, ': array(1)', array(1),' shared with next rank: ',array(irank+1) else print 77, ' rank', irank, ': array(1)', array(1),' shared with previous rank: ',array(0) endif CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) else CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) if(.not.associated(array))then print 77, 'zero pointer found, rank', irank else print 77, ' rank', irank, ' array associated ' print 77, ' rank', irank, ': array(1) ', array(1),' shared with next rank: ',array(irank+1) endif endif call MPI_Finalize(ierr) end program sharetest
Re: [OMPI users] shared memory zero size segment
- Original Message - > Peter -- > > Somewhere along the way, your attachment got lost. Could you re-send? > > Thanks. > > > > On Feb 10, 2016, at 5:56 AM, Peter Wind wrote: > > > > Hi, > > > > Under fortran, MPI_Win_allocate_shared is called with a window size of zero > > for some processes. > > The output pointer is then not valid for these processes (null pointer). > > Did I understood this wrongly? shouldn't the pointers be contiguous, so > > that for a zero sized window, the pointer should point to the start of the > > segment of the next rank? > > The documentation explicitly specifies "size = 0 is valid". > > > > Attached a small code, where rank=0 allocate a window of size zero. All the > > other ranks get valid pointers, except rank 0. > > > > Best regards, > > Peter > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/02/28485.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28486.php > program sharetest ! test zero size segment. ! run on at least 3 cpus ! mpirun -np 4 a.out use mpi use, intrinsic :: iso_c_binding, implicit none integer, parameter :: nsize = 20 integer, pointer :: array(:) integer:: num_procs integer:: ierr integer:: irank, irank_group integer:: win integer:: disp_unit type(c_ptr):: cp1 type(c_ptr):: cp2 integer(MPI_ADDRESS_KIND) :: win_size integer(MPI_ADDRESS_KIND) :: segment_size call MPI_Init(ierr) call MPI_Comm_size(MPI_COMM_WORLD, num_procs, ierr) call MPI_Comm_rank(MPI_COMM_WORLD, irank, ierr) disp_unit = sizeof(1) win_size = irank*disp_unit call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, cp1, win, ierr) ! write(*,*)'rank ', irank,', pointer ',cp1 call c_f_pointer(cp1, array, [nsize]) 77 format(4(A,I3)) if(irank/=0)then array(1)=irank CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) if(irank/=num_procs-1)then print 77, ' rank', irank, ': array(1)', array(1),' shared with next rank: ',array(irank+1) else print 77, ' rank', irank, ': array(1)', array(1),' shared with previous rank: ',array(0) endif CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) else CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) if(.not.associated(array))then print 77, 'zero pointer found, rank', irank else print 77, ' rank', irank, ' array associated ' print 77, ' rank', irank, ': array(1) ', array(1),' shared with next rank: ',array(irank+1) endif endif call MPI_Finalize(ierr) end program sharetest
[OMPI users] shared memory zero size segment
Hi, Under fortran, MPI_Win_allocate_shared is called with a window size of zero for some processes. The output pointer is then not valid for these processes (null pointer). Did I understood this wrongly? shouldn't the pointers be contiguous, so that for a zero sized window, the pointer should point to the start of the segment of the next rank? The documentation explicitly specifies "size = 0 is valid". Attached a small code, where rank=0 allocate a window of size zero. All the other ranks get valid pointers, except rank 0. Best regards, Peter
Re: [OMPI users] shared memory under fortran, bug?
That worked! i.e with the changed you proposed the code gives the right result. That was efficient work, thank you Gilles :) Best wishes, Peter - Original Message - > Thanks Peter, > that is quite unexpected ... > let s try an other workaround, can you replace > integer:: comm_group > with > integer:: comm_group, comm_tmp > and > call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr) > with > call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_tmp, ierr) > if (irank < (num_procs/2)) then > comm_group = comm_tmp > else > call MPI_Comm_dup(comm_tmp, comm_group, ierr) > endif > if it works, I will make a fix tomorrow when I can access my workstation. > if not, can you please run > mpirun --mca osc_base_verbose 100 ... > and post the output ? > I will then try to reproduce the issue and investigate it > Cheers, > Gilles > On Tuesday, February 2, 2016, Peter Wind < peter.w...@met.no > wrote: > > Thanks Gilles, > > > I get the following output (I guess it is not what you wanted?). > > > Peter > > > $ mpirun --mca osc pt2pt -np 4 a.out > > > -- > > > A requested component was not found, or was unable to be opened. This > > > means that this component is either not installed or is unable to be > > > used on your system (e.g., sometimes this means that shared libraries > > > that the component requires are unable to be found/loaded). Note that > > > Open MPI stopped checking at the first component that it did not find. > > > Host: stallo-2.local > > > Framework: osc > > > Component: pt2pt > > > -- > > > -- > > > It looks like MPI_INIT failed for some reason; your parallel process is > > > likely to abort. There are many reasons that a parallel process can > > > fail during MPI_INIT; some of which are due to configuration or environment > > > problems. This failure appears to be an internal failure; here's some > > > additional information (which may only be relevant to an Open MPI > > > developer): > > > ompi_osc_base_open() failed > > > --> Returned "Not found" (-13) instead of "Success" (0) > > > -- > > > *** An error occurred in MPI_Init > > > *** on a NULL communicator > > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > > *** and potentially your MPI job) > > > [stallo-2.local:38415] Local abort before MPI_INIT completed successfully; > > not able to aggregate error messages, and not able to guarantee that all > > other processes were killed! > > > *** An error occurred in MPI_Init > > > *** on a NULL communicator > > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > > *** and potentially your MPI job) > > > [stallo-2.local:38418] Local abort before MPI_INIT completed successfully; > > not able to aggregate error messages, and not able to guarantee that all > > other processes were killed! > > > *** An error occurred in MPI_Init > > > *** on a NULL communicator > > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > > *** and potentially your MPI job) > > > [stallo-2.local:38416] Local abort before MPI_INIT completed successfully; > > not able to aggregate error messages, and not able to guarantee that all > > other processes were killed! > > > *** An error occurred in MPI_Init > > > *** on a NULL communicator > > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > > *** and potentially your MPI job) > > > [stallo-2.local:38417] Local abort before MPI_INIT completed successfully; > > not able to aggregate error messages, and not able to guarantee that all > > other processes were killed! > > > --- > > > Primary job terminated normally, but 1 process returned > > > a non-zero exit code.. Per user-direction, the job has been aborted. > > > --- > > > -- > > > mpirun detected that one or more processes exited with no
Re: [OMPI users] shared memory under fortran, bug?
Thanks Gilles, I get the following output (I guess it is not what you wanted?). Peter $ mpirun --mca osc pt2pt -np 4 a.out -- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: stallo-2.local Framework: osc Component: pt2pt -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_osc_base_open() failed --> Returned "Not found" (-13) instead of "Success" (0) -- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [stallo-2.local:38415] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [stallo-2.local:38418] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [stallo-2.local:38416] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [stallo-2.local:38417] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! --- Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. --- -- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[52507,1],0] Exit code: 1 -- [stallo-2.local:38410] 3 more processes have sent help message help-mca-base.txt / find-available:not-valid [stallo-2.local:38410] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [stallo-2.local:38410] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure - Original Message - > Peter, > at first glance, your test program looks correct. > can you please try to run > mpirun --mca osc pt2pt -np 4 ... > I might have identified a bug with the sm osc component. > Cheers, > Gilles > On Tuesday, February 2, 2016, Peter Wind < peter.w...@met.no > wrote: > > Enclosed is a short (< 100 lines) fortran code example that uses shared > > memory. > > > It seems to me it behaves wrongly if openmpi is used. > > > Compiled with SGI/mpt , it gives the right result. > > > To fail, the code must be run on a single node. > > > It creates two groups of 2 processes each. Within each group memory is > > shared. > > > The error is that the two groups get the same memory allocated, but they > > should not. > > > Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel > > 14.0 > > > all fail. > > > The call: > > > call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, > > comm_group, > > cp1, win, ierr) > > > Should allocate memory only within the group. But when the other group > > allocates memory, the pointers from the two groups point to the same > > a
[OMPI users] shared memory under fortran, bug?
Enclosed is a short (< 100 lines) fortran code example that uses shared memory. It seems to me it behaves wrongly if openmpi is used. Compiled with SGI/mpt , it gives the right result. To fail, the code must be run on a single node. It creates two groups of 2 processes each. Within each group memory is shared. The error is that the two groups get the same memory allocated, but they should not. Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel 14.0 all fail. The call: call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, comm_group, cp1, win, ierr) Should allocate memory only within the group. But when the other group allocates memory, the pointers from the two groups point to the same address in memory. Could you please confirm that this is the wrong behaviour? Best regards, Peter Windprogram shmem_mpi ! ! in this example two groups are created, within each group memory is shared. ! Still the other group get allocated the same adress space, which it shouldn't. ! ! Run with 4 processes, mpirun -np 4 a.out use mpi use, intrinsic :: iso_c_binding, only : c_ptr, c_f_pointer implicit none ! include 'mpif.h' integer, parameter :: nsize = 100 integer, pointer :: array(:) integer:: num_procs integer:: ierr integer:: irank, irank_group integer:: win integer:: comm = MPI_COMM_WORLD integer:: disp_unit type(c_ptr):: cp1 type(c_ptr):: cp2 integer:: comm_group integer(MPI_ADDRESS_KIND) :: win_size integer(MPI_ADDRESS_KIND) :: segment_size call MPI_Init(ierr) call MPI_Comm_size(comm, num_procs, ierr) call MPI_Comm_rank(comm, irank, ierr) disp_unit = sizeof(1) call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr) call MPI_Comm_rank(comm_group, irank_group, ierr) ! print *, 'irank=', irank, ' group rank=', irank_group if (irank_group == 0) then win_size = nsize*disp_unit else win_size = 0 endif call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, comm_group, cp1, win, ierr) call MPI_Win_fence(0, win, ierr) call MPI_Win_shared_query(win, 0, segment_size, disp_unit, cp2, ierr) call MPI_Win_fence(0, win, ierr) CALL MPI_BARRIER(comm, ierr)! allocations finished ! print *, 'irank=', irank, ' size ', segment_size call c_f_pointer(cp2, array, [nsize]) array(1)=0;array(2)=0 CALL MPI_BARRIER(comm, ierr)! 77 format(4(A,I3)) if(irank