Re: [OMPI devel] OSC module change

2017-11-30 Thread Barrett, Brian via devel
One day, I should really go remember how all that code I wrote many moons ago 
works… :).  ompi_win_t has to have the group pointer so that the MPI layer can 
implement MPI_WIN_GET_GROUP.  I should have remembered that, rather than 
suggesting there was work to do.  Sorry about that.

Brian

> On Nov 30, 2017, at 9:48 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Woo hoo!  Thanks for doing that.  :-)
> 
> 
>> On Nov 30, 2017, at 12:43 PM, Clement FOYER  wrote:
>> 
>> Hi devels,
>> 
>> In fact the communicator's group was already retained in the window 
>> structure. So everything was already in place. I pushed the last 
>> modifications, and everything seems ready to be merged in PR#4527.
>> 
>> Jeff, the fixup commits are squashed :)
>> 
>> Clément
>> 
>> On 11/30/2017 12:00 AM, Barrett, Brian via devel wrote:
>>> The group is the easiest way to do the mapping from rank in window to 
>>> ompi_proc_t, so it’s safe to say every window will have one (also, as a way 
>>> of holding a reference to the ompi_proc_t).  So I think it’s safe to say 
>>> that every OSC module has a group handle somewhere (directly or through the 
>>> communicator).
>>> 
>>> Remember that in some implementations of the MTL, a communicator ID is a 
>>> precious resource.  I don’t know where Portals 4 falls right now, but in 
>>> various of the 64 bit tag matching implementations, it’s been as low as 4k 
>>> communicators.  There’s no need for a cid if all you hold is a group 
>>> reference.  Plus, a communicator has a bunch of other state (collective 
>>> modules handles, etc.) that aren’t necessarily needed by a window.
>>> 
>>> Brian
>>> 
 On Nov 29, 2017, at 5:57 AM, Clement FOYER  wrote:
 
 Hi Brian,
 
 Even if I see your point, I don't think a user request de free the 
 communicator should necesserily lead to the communicator being deleted, 
 only released from one hold, and available to be disposed by the library. 
 I don't see objection to have the library keep a grab on these 
 communicators, as the user give a handle to the actual object.
 
 I do agree the point of asking if we want to keep only information 
 relevant to all OSC components. Nevertheless, what would the difference be 
 between holding the complete communicator and holding the group only? Is 
 group the smallest part common to every component?
 
 Clément
 
 On 11/28/2017 07:46 PM, Barrett, Brian via devel wrote:
> The following is perfectly legal:
> 
> MPI_Comm_dup(some_comm, &tmp_comm);
> MPI_Win_create(…., tmp_comm, &window);
> MPI_Comm_free(tmp_comm);
> 
> 
> 
> So I don’t think stashing away a communicator is the solution.  Is a 
> group sufficient?  I think any rational reading of the standard would 
> lead to windows needing to hold a group reference for the life of the 
> window.  I’d be ok putting a group pointer in the base window, if that 
> would work?
> 
> Brian
> 
>> On Nov 28, 2017, at 10:19 AM, George Bosilca  wrote:
>> 
>> Hi Brian,
>> 
>> Let me first start with explaining why we need the communicator. We need 
>> to translate local to global rank (aka. rank in your MPI_COMM_WORLD), so 
>> that the communication map we provide make sense. The only way today is 
>> to go back to a communicator and then basically translate a rank between 
>> this communicator and MPI_COMM_WORLD. We could use the gid, but then we 
>> have a hash table lookup for every operation.
>> 
>> While a communicator is not needed internally by an OSC, in MPI world 
>> all windows start with a communicator. This is the reason why I was 
>> proposing the change, not to force a window to create or hold a 
>> communicator, but simply because the existence of a communicator linked 
>> to the window is more of less enforced by the MPI standard.
>> 
>>  George.
>> 
>> 
>> 
>> On Tue, Nov 28, 2017 at 1:02 PM, Barrett, Brian via devel 
>>  wrote:
>> The objection I have to this is that it forces an implementation where 
>> every one-sided component is backed by a communicator.  While that’s the 
>> case today, it’s certainly not required. 
>>  If you look at Portal 4, for example, there’s one collective call 
>> outside of initialization, and that’s a barrier in MPI_FENCE.  The SM 
>> component is the same way and given some of the use cases for shared 
>> memory allocation using the SM component, it’s very possible that we’ll 
>> be faced with a situation where creating a communicator per SM region is 
>> too expensive in terms of overall communicator count.
>> 
>> I guess a different question would be what you need the communicator 
>> for.  It shouldn’t have any useful semantic meaning, so why isn’t a 
>> silent implementation detail for the monitoring component?

Re: [OMPI devel] OSC module change

2017-11-30 Thread Jeff Squyres (jsquyres)
Woo hoo!  Thanks for doing that.  :-)


> On Nov 30, 2017, at 12:43 PM, Clement FOYER  wrote:
> 
> Hi devels,
> 
> In fact the communicator's group was already retained in the window 
> structure. So everything was already in place. I pushed the last 
> modifications, and everything seems ready to be merged in PR#4527.
> 
> Jeff, the fixup commits are squashed :)
> 
> Clément
> 
> On 11/30/2017 12:00 AM, Barrett, Brian via devel wrote:
>> The group is the easiest way to do the mapping from rank in window to 
>> ompi_proc_t, so it’s safe to say every window will have one (also, as a way 
>> of holding a reference to the ompi_proc_t).  So I think it’s safe to say 
>> that every OSC module has a group handle somewhere (directly or through the 
>> communicator).
>> 
>> Remember that in some implementations of the MTL, a communicator ID is a 
>> precious resource.  I don’t know where Portals 4 falls right now, but in 
>> various of the 64 bit tag matching implementations, it’s been as low as 4k 
>> communicators.  There’s no need for a cid if all you hold is a group 
>> reference.  Plus, a communicator has a bunch of other state (collective  
>>modules handles, etc.) that aren’t necessarily needed by a window.
>> 
>> Brian
>> 
>>> On Nov 29, 2017, at 5:57 AM, Clement FOYER  wrote:
>>> 
>>> Hi Brian,
>>> 
>>> Even if I see your point, I don't think a user request de free the 
>>> communicator should necesserily lead to the communicator being deleted, 
>>> only released from one hold, and available to be disposed by the library. I 
>>> don't see objection to have the library keep a grab on these communicators, 
>>> as the user give a handle to the actual object.
>>> 
>>> I do agree the point of asking if we want to keep only information relevant 
>>> to all OSC components. Nevertheless, what would the difference be between 
>>> holding the complete communicator and holding the group only? Is group the 
>>> smallest part common to every component?
>>> 
>>> Clément
>>> 
>>> On 11/28/2017 07:46 PM, Barrett, Brian via devel wrote:
 The following is perfectly legal:
 
 MPI_Comm_dup(some_comm, &tmp_comm);
 MPI_Win_create(…., tmp_comm, &window);
 MPI_Comm_free(tmp_comm);
 
 
 
 So I don’t think stashing away a communicator is the solution.  Is a group 
 sufficient?  I think any rational reading of the standard would lead to 
 windows needing to hold a group reference for the life of the window.  I’d 
 be ok putting a group pointer in the base window, if that would work?
 
 Brian
 
> On Nov 28, 2017, at 10:19 AM, George Bosilca  wrote:
> 
> Hi Brian,
> 
> Let me first start with explaining why we need the communicator. We need 
> to translate local to global rank (aka. rank in your MPI_COMM_WORLD), so 
> that the communication map we provide make sense. The only way today is 
> to go back to a communicator and then basically translate a rank between 
> this communicator and MPI_COMM_WORLD. We could use the gid, but then we 
> have a hash table lookup for every operation.
> 
> While a communicator is not needed internally by an OSC, in MPI world all 
> windows start with a communicator. This is the reason why I was proposing 
> the change, not to force a window to create or hold a communicator, but 
> simply because the existence of a communicator linked to the window is 
> more of less enforced by the MPI standard.
> 
>   George.
> 
> 
> 
> On Tue, Nov 28, 2017 at 1:02 PM, Barrett, Brian via devel 
>  wrote:
> The objection I have to this is that it forces an implementation where 
> every one-sided component is backed by a communicator.  While that’s the 
> case today, it’s certainly not required.  
> If you look at Portal 4, for example, there’s one collective call outside 
> of initialization, and that’s a barrier in MPI_FENCE.  The SM component 
> is the same way and given some of the use cases for shared memory 
> allocation using the SM component, it’s very possible that we’ll be faced 
> with a situation where creating a communicator per SM region is too 
> expensive in terms of overall communicator count.
> 
> I guess a different question would be what you need the communicator for. 
>  It shouldn’t have any useful semantic meaning, so why isn’t a silent 
> implementation detail for the monitoring component?
> 
> Brian
> 
> 
>> On Nov 28, 2017, at 8:45 AM, George Bosilca  wrote:
>> 
>> Devels,
>> 
>> We would like to change the definition of the OSC module to move the 
>> communicator one level up from the different module structures into the 
>> base OSC module. The reason for this, as well as a lengthy discussion on 
>> other possible solutions can be found in 
>> https://github.com/open-mpi/ompi/pull/4527.
>> 
>>

Re: [OMPI devel] OSC module change

2017-11-30 Thread Clement FOYER

Hi devels,

In fact the communicator's group was already retained in the window 
structure. So everything was already in place. I pushed the last 
modifications, and everything seems ready to be merged in PR#4527.


Jeff, the fixup commits are squashed :)

Clément


On 11/30/2017 12:00 AM, Barrett, Brian via devel wrote:
The group is the easiest way to do the mapping from rank in window to 
ompi_proc_t, so it’s safe to say every window will have one (also, as 
a way of holding a reference to the ompi_proc_t).  So I think it’s 
safe to say that every OSC module has a group handle somewhere 
(directly or through the communicator).


Remember that in some implementations of the MTL, a communicator ID is 
a precious resource.  I don’t know where Portals 4 falls right now, 
but in various of the 64 bit tag matching implementations, it’s been 
as low as 4k communicators.  There’s no need for a cid if all you hold 
is a group reference.  Plus, a communicator has a bunch of other state 
(collective modules handles, etc.) that aren’t necessarily needed by a 
window.


Brian

On Nov 29, 2017, at 5:57 AM, Clement FOYER > wrote:


Hi Brian,

Even if I see your point, I don't think a user request de free the 
communicator should necesserily lead to the communicator being 
deleted, only released from one hold, and available to be disposed by 
the library. I don't see objection to have the library keep a grab on 
these communicators, as the user give a handle to the actual object.


I do agree the point of asking if we want to keep only information 
relevant to all OSC components. Nevertheless, what would the 
difference be between holding the complete communicator and holding 
the group only? Is group the smallest part common to every component?


Clément


On 11/28/2017 07:46 PM, Barrett, Brian via devel wrote:

The following is perfectly legal:

MPI_Comm_dup(some_comm, &tmp_comm);
MPI_Win_create(…., tmp_comm, &window);
MPI_Comm_free(tmp_comm);



So I don’t think stashing away a communicator is the solution.  Is a 
group sufficient?  I think any rational reading of the standard 
would lead to windows needing to hold a group reference for the life 
of the window.  I’d be ok putting a group pointer in the base 
window, if that would work?


Brian

On Nov 28, 2017, at 10:19 AM, George Bosilca > wrote:


Hi Brian,

Let me first start with explaining why we need the communicator. We 
need to translate local to global rank (aka. rank in your 
MPI_COMM_WORLD), so that the communication map we provide make 
sense. The only way today is to go back to a communicator and then 
basically translate a rank between this communicator and 
MPI_COMM_WORLD. We could use the gid, but then we have a hash table 
lookup for every operation.


While a communicator is not needed internally by an OSC, in MPI 
world all windows start with a communicator. This is the reason why 
I was proposing the change, not to force a window to create or hold 
a communicator, but simply because the existence of a communicator 
linked to the window is more of less enforced by the MPI standard.


  George.



On Tue, Nov 28, 2017 at 1:02 PM, Barrett, Brian via devel 
mailto:devel@lists.open-mpi.org>> wrote:


The objection I have to this is that it forces an
implementation where every one-sided component is backed by a
communicator.  While that’s the case today, it’s certainly not
required.  If you look at Portal 4, for example, there’s one
collective call outside of initialization, and that’s a barrier
in MPI_FENCE.  The SM component is the same way and given some
of the use cases for shared memory allocation using the SM
component, it’s very possible that we’ll be faced with a
situation where creating a communicator per SM region is too
expensive in terms of overall communicator count.

I guess a different question would be what you need the
communicator for.  It shouldn’t have any useful semantic
meaning, so why isn’t a silent implementation detail for the
monitoring component?

Brian



On Nov 28, 2017, at 8:45 AM, George Bosilca
mailto:bosi...@icl.utk.edu>> wrote:

Devels,

We would like to change the definition of the OSC module to
move the communicator one level up from the different module
structures into the base OSC module. The reason for this, as
well as a lengthy discussion on other possible solutions can
be found in https://github.com/open-mpi/ompi/pull/4527
.

We need to take a decision on this asap, to prepare the PR for
the 3.1. Please comment asap.

  George.

___
devel mailing list
devel@lists.open-mpi.org 
https://lists.open-mpi.org/mailman/listinfo/devel




_