Re: [OMPI devel] Changes to classes in OMPI

2013-10-15 Thread George Bosilca

On Oct 11, 2013, at 17:09 , Ralph Castain  wrote:

> 
> On Oct 11, 2013, at 4:07 AM, George Bosilca  wrote:
> 
>> 
>> On Oct 9, 2013, at 15:29 , Ralph Castain  wrote:
>> 
>>> IIRC, the concern was with where the thread safety should reside. Some 
>>> classes (e.g., opal_list) were littered with thread locks for every 
>>> operation. So if someone implemented thread protection at a higher level 
>>> (e.g., protecting the list while cycling thru it), then all these 
>>> lower-level lock/unlock operations were just a waste of cycles.
>> 
>> I tried to find these protections in the basic objects (pal_list_t as you 
>> named it) but I failed. I don't see this being the case in any of the 
>> versions out there (1.6, 1.7 nor trunk). There are some atomic operations to 
>> keep track of the ref counts, but this is a completely different topic.
>> 
>> In the OMPI layer we tried to follow the rule that all calls without 
>> capitals are not thread safe (and are functions), while all calls with 
>> capitals at macros and are protected. This was a best effort applied where 
>> it made sense.
> 
> Only one I could find that has been renamed is ompi_free_list_resize, which 
> has been renamed to ompi_free_list_resize_mt as it includes a lock/unlock in 
> it. However, there are many places in the opal and ompi classes where thread 
> locks are being called - this is what we seek to remove.
> 
> opal/class/opal_pointer_array.c:45:OBJ_CONSTRUCT(&array->lock, 
> opal_mutex_t);
> opal/class/opal_pointer_array.c:67:OBJ_DESTRUCT(&array->lock);
> opal/class/opal_pointer_array.c:113:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:120:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:149:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:171:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:175:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:215:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:248:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:251:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:260:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:291:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:297:OPAL_THREAD_LOCK(&(array->lock));
> opal/class/opal_pointer_array.c:300:
> OPAL_THREAD_UNLOCK(&(array->lock));
> opal/class/opal_pointer_array.c:304:OPAL_THREAD_UNLOCK(&(array->lock));
>> 
>>> However, some people felt that there were places where it helped to have 
>>> the locking down below. So this was the compromise - use the version that 
>>> fits your situation.
>> 
>> In most of the cases there is nothing better we can do down than protecting 
>> the call itself. 
>> 
>>> Personally, I'm not wild about it, but I can live with it. I'd prefer to 
>>> see no lock/unlock calls in the classes themselves as they are too 
>>> atomistic, and would have opted for providing a macro version of the 
>>> function that included the appropriate lock/unlocks around the function.
>> 
>> I'm 100% with you here, I also prefer to see the locks, as this makes errors 
>> easier to spot. This is why I'm concerned about moving them outside the 
>> view, buried under several levels of macro indirections. I could understand 
>> the push if there was an obvious performance or safety benefit, but as I 
>> fail to see I was wondering if I was missing something from the "bigger" 
>> picture.
> 
> Here's how I recollect the discussion. There are thread locks down in many of 
> the opal classes
> - the opal_pointer_array and opal_list functions have embedded lock/unlock in 
> their operations, and I believe others do too.

There are only 3 classes that have locks: pointer array, freelist and ring 
buffer. The opal_list has nothing to do with threads, there are no protections.

> We talked about our desired threading model and agreed that this was too low 
> down in the stack. For example, looping over an opal_list shouldn't invoke a 
> thread lock/unlock for every opal_list_get_next call - we can just lock the 
> loop and avoid all the performance hit. So we agreed on a higher-level thread 
> protection model where we lock up above where the calls are being made.

Thing that can be reached for all existing classes by calling the version 
without capitals. There is one exception, the pointer array which was one of 
these classes with a double history (one in ORTE and one in OMPI). The OMPI 
version needed protection as we use it to make the translation between C and 
Fortran …

> However, someone pointed out that there might be times when locking at the 
> lower level made sense. So we agreed that any function that actually might 
> benefit from internal thread protection would have two variants: _mt that had 
> the locks, and _st that did not. I t

Re: [OMPI devel] Changes to classes in OMPI

2013-10-11 Thread Ralph Castain

On Oct 11, 2013, at 4:07 AM, George Bosilca  wrote:

> 
> On Oct 9, 2013, at 15:29 , Ralph Castain  wrote:
> 
>> IIRC, the concern was with where the thread safety should reside. Some 
>> classes (e.g., opal_list) were littered with thread locks for every 
>> operation. So if someone implemented thread protection at a higher level 
>> (e.g., protecting the list while cycling thru it), then all these 
>> lower-level lock/unlock operations were just a waste of cycles.
> 
> I tried to find these protections in the basic objects (pal_list_t as you 
> named it) but I failed. I don't see this being the case in any of the 
> versions out there (1.6, 1.7 nor trunk). There are some atomic operations to 
> keep track of the ref counts, but this is a completely different topic.
> 
> In the OMPI layer we tried to follow the rule that all calls without capitals 
> are not thread safe (and are functions), while all calls with capitals at 
> macros and are protected. This was a best effort applied where it made sense.

Only one I could find that has been renamed is ompi_free_list_resize, which has 
been renamed to ompi_free_list_resize_mt as it includes a lock/unlock in it. 
However, there are many places in the opal and ompi classes where thread locks 
are being called - this is what we seek to remove.

opal/class/opal_pointer_array.c:45:OBJ_CONSTRUCT(&array->lock, 
opal_mutex_t);
opal/class/opal_pointer_array.c:67:OBJ_DESTRUCT(&array->lock);
opal/class/opal_pointer_array.c:113:OPAL_THREAD_LOCK(&(table->lock));
opal/class/opal_pointer_array.c:120:
OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:149:OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:171:OPAL_THREAD_LOCK(&(table->lock));
opal/class/opal_pointer_array.c:175:
OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:215:OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:248:OPAL_THREAD_LOCK(&(table->lock));
opal/class/opal_pointer_array.c:251:OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:260:
OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:291:OPAL_THREAD_UNLOCK(&(table->lock));
opal/class/opal_pointer_array.c:297:OPAL_THREAD_LOCK(&(array->lock));
opal/class/opal_pointer_array.c:300:
OPAL_THREAD_UNLOCK(&(array->lock));
opal/class/opal_pointer_array.c:304:OPAL_THREAD_UNLOCK(&(array->lock));

> 
>> However, some people felt that there were places where it helped to have the 
>> locking down below. So this was the compromise - use the version that fits 
>> your situation.
> 
> In most of the cases there is nothing better we can do down than protecting 
> the call itself. 
> 
>> Personally, I'm not wild about it, but I can live with it. I'd prefer to see 
>> no lock/unlock calls in the classes themselves as they are too atomistic, 
>> and would have opted for providing a macro version of the function that 
>> included the appropriate lock/unlocks around the function.
> 
> I'm 100% with you here, I also prefer to see the locks, as this makes errors 
> easier to spot. This is why I'm concerned about moving them outside the view, 
> buried under several levels of macro indirections. I could understand the 
> push if there was an obvious performance or safety benefit, but as I fail to 
> see I was wondering if I was missing something from the "bigger" picture.

Here's how I recollect the discussion. There are thread locks down in many of 
the opal classes - the opal_pointer_array and opal_list functions have embedded 
lock/unlock in their operations, and I believe others do too. We talked about 
our desired threading model and agreed that this was too low down in the stack. 
For example, looping over an opal_list shouldn't invoke a thread lock/unlock 
for every opal_list_get_next call - we can just lock the loop and avoid all the 
performance hit. So we agreed on a higher-level thread protection model where 
we lock up above where the calls are being made.

However, someone pointed out that there might be times when locking at the 
lower level made sense. So we agreed that any function that actually might 
benefit from internal thread protection would have two variants: _mt that had 
the locks, and _st that did not. I think the rationale against the macro 
definition is that the lock might occur inside the function - e.g., there might 
be a conditional branch in the function that required the lock, but not the 
entire function. Putting the lock around the function would force it to always 
occur, which letting the lock be inside the function would avoid it unless 
necessary.

Hence the work plan in the meeting minutes. Someone would go into those classes 
and make two copies of each class. The _st version would have all its 
lock/unlock calls removed, while the _mt version would retain them. We would 
then go into the places where those classes are called and either

Re: [OMPI devel] Changes to classes in OMPI

2013-10-11 Thread George Bosilca

On Oct 9, 2013, at 15:29 , Ralph Castain  wrote:

> IIRC, the concern was with where the thread safety should reside. Some 
> classes (e.g., opal_list) were littered with thread locks for every 
> operation. So if someone implemented thread protection at a higher level 
> (e.g., protecting the list while cycling thru it), then all these lower-level 
> lock/unlock operations were just a waste of cycles.

I tried to find these protections in the basic objects (pal_list_t as you named 
it) but I failed. I don't see this being the case in any of the versions out 
there (1.6, 1.7 nor trunk). There are some atomic operations to keep track of 
the ref counts, but this is a completely different topic.

In the OMPI layer we tried to follow the rule that all calls without capitals 
are not thread safe (and are functions), while all calls with capitals at 
macros and are protected. This was a best effort applied where it made sense.

> However, some people felt that there were places where it helped to have the 
> locking down below. So this was the compromise - use the version that fits 
> your situation.

In most of the cases there is nothing better we can do down than protecting the 
call itself. 

> Personally, I'm not wild about it, but I can live with it. I'd prefer to see 
> no lock/unlock calls in the classes themselves as they are too atomistic, and 
> would have opted for providing a macro version of the function that included 
> the appropriate lock/unlocks around the function.

I'm 100% with you here, I also prefer to see the locks, as this makes errors 
easier to spot. This is why I'm concerned about moving them outside the view, 
buried under several levels of macro indirections. I could understand the push 
if there was an obvious performance or safety benefit, but as I fail to see I 
was wondering if I was missing something from the "bigger" picture.

  George.

> 
> Anyway, that was the thinking at the meeting last June.
> 
> 
> On Oct 9, 2013, at 1:40 AM, George Bosilca  wrote:
> 
>> My concern is that increasing the number of interfaces will not make the 
>> code thread safe, as in most cases thread safety is not only a matter of 
>> using a _mt version of the basic class object but a matter of a careful 
>> manipulation of higher level concepts.
>> 
>> We can hardly use the lack of the _MT function as a reason for not having 
>> thread safety in the code. We did have the thread safety a while back 
>> without the support of _MT version of all the basic classes. 
>> 
>> So I really wonder what is the rationale behind such an intrusive change of 
>> the codebase?
>> 
>>   George.
>> 
>> On Oct 8, 2013, at 18:14 , Ralph Castain  wrote:
>> 
>>> Hi folks
>>> 
>>> This was one item from the last devel meeting that can be done independent 
>>> of other things:
>>> 
>>> • resolution: all opal and orte (and possibly ompi) classes 
>>> need to have a thread safe and thread-free interface
>>> • _st suffix: single thread (i.e., not thread safe 
>>> variant)
>>> • _mt suffix: multi thread (i.e., thread safe variant)
>>> • for functions that have both st/mt, they will 
>>> *both* have suffixes
>>> • other functions (that do not have st/mt 
>>> versions) will be naked names
>>> • need to rename all classes that have locking enabled 
>>> already (e.g., opal_free_list)
>>> • so today, we go rename all the functions (e.g., 
>>> opal_free_list functions get _mt suffix) throughout the code base
>>> • as someone needs the _st version, they go create it 
>>> and use it as they want to
>>> • Ralph will do the orte classes
>>> • Aurelien will do this for the ompi classes
>>> 
>>> I believe some of these have been done - I will take care of the ORTE 
>>> classes this week, so consider this a "heads up" for that change.
>>> Ralph
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Changes to classes in OMPI

2013-10-09 Thread Ralph Castain
IIRC, the concern was with where the thread safety should reside. Some classes 
(e.g., opal_list) were littered with thread locks for every operation. So if 
someone implemented thread protection at a higher level (e.g., protecting the 
list while cycling thru it), then all these lower-level lock/unlock operations 
were just a waste of cycles.

However, some people felt that there were places where it helped to have the 
locking down below. So this was the compromise - use the version that fits your 
situation.

Personally, I'm not wild about it, but I can live with it. I'd prefer to see no 
lock/unlock calls in the classes themselves as they are too atomistic, and 
would have opted for providing a macro version of the function that included 
the appropriate lock/unlocks around the function.

Anyway, that was the thinking at the meeting last June.


On Oct 9, 2013, at 1:40 AM, George Bosilca  wrote:

> My concern is that increasing the number of interfaces will not make the code 
> thread safe, as in most cases thread safety is not only a matter of using a 
> _mt version of the basic class object but a matter of a careful manipulation 
> of higher level concepts.
> 
> We can hardly use the lack of the _MT function as a reason for not having 
> thread safety in the code. We did have the thread safety a while back without 
> the support of _MT version of all the basic classes. 
> 
> So I really wonder what is the rationale behind such an intrusive change of 
> the codebase?
> 
>   George.
> 
> On Oct 8, 2013, at 18:14 , Ralph Castain  wrote:
> 
>> Hi folks
>> 
>> This was one item from the last devel meeting that can be done independent 
>> of other things:
>> 
>>  • resolution: all opal and orte (and possibly ompi) classes 
>> need to have a thread safe and thread-free interface
>>  • _st suffix: single thread (i.e., not thread safe 
>> variant)
>>  • _mt suffix: multi thread (i.e., thread safe variant)
>>  • for functions that have both st/mt, they will 
>> *both* have suffixes
>>  • other functions (that do not have st/mt 
>> versions) will be naked names
>>  • need to rename all classes that have locking enabled 
>> already (e.g., opal_free_list)
>>  • so today, we go rename all the functions (e.g., 
>> opal_free_list functions get _mt suffix) throughout the code base
>>  • as someone needs the _st version, they go create it 
>> and use it as they want to
>>  • Ralph will do the orte classes
>>  • Aurelien will do this for the ompi classes
>> 
>> I believe some of these have been done - I will take care of the ORTE 
>> classes this week, so consider this a "heads up" for that change.
>> Ralph
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Changes to classes in OMPI

2013-10-09 Thread George Bosilca
My concern is that increasing the number of interfaces will not make the code 
thread safe, as in most cases thread safety is not only a matter of using a _mt 
version of the basic class object but a matter of a careful manipulation of 
higher level concepts.

We can hardly use the lack of the _MT function as a reason for not having 
thread safety in the code. We did have the thread safety a while back without 
the support of _MT version of all the basic classes. 

So I really wonder what is the rationale behind such an intrusive change of the 
codebase?

  George.

On Oct 8, 2013, at 18:14 , Ralph Castain  wrote:

> Hi folks
> 
> This was one item from the last devel meeting that can be done independent of 
> other things:
> 
>   • resolution: all opal and orte (and possibly ompi) classes 
> need to have a thread safe and thread-free interface
>   • _st suffix: single thread (i.e., not thread safe 
> variant)
>   • _mt suffix: multi thread (i.e., thread safe variant)
>   • for functions that have both st/mt, they will 
> *both* have suffixes
>   • other functions (that do not have st/mt 
> versions) will be naked names
>   • need to rename all classes that have locking enabled 
> already (e.g., opal_free_list)
>   • so today, we go rename all the functions (e.g., 
> opal_free_list functions get _mt suffix) throughout the code base
>   • as someone needs the _st version, they go create it 
> and use it as they want to
>   • Ralph will do the orte classes
>   • Aurelien will do this for the ompi classes
> 
> I believe some of these have been done - I will take care of the ORTE classes 
> this week, so consider this a "heads up" for that change.
> Ralph
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Changes to classes in OMPI

2013-10-08 Thread Ralph Castain
Hi folks

This was one item from the last devel meeting that can be done independent of 
other things:

• resolution: all opal and orte (and possibly ompi) classes 
need to have a thread safe and thread-free interface
• _st suffix: single thread (i.e., not thread safe 
variant)
• _mt suffix: multi thread (i.e., thread safe variant)
• for functions that have both st/mt, they will 
*both* have suffixes
• other functions (that do not have st/mt 
versions) will be naked names
• need to rename all classes that have locking enabled 
already (e.g., opal_free_list)
• so today, we go rename all the functions (e.g., 
opal_free_list functions get _mt suffix) throughout the code base
• as someone needs the _st version, they go create it 
and use it as they want to
• Ralph will do the orte classes
• Aurelien will do this for the ompi classes

I believe some of these have been done - I will take care of the ORTE classes 
this week, so consider this a "heads up" for that change.
Ralph