Re: [OMPI devel] Changes to classes in OMPI

2013-10-15 Thread George Bosilca

On Oct 11, 2013, at 17:09 , Ralph Castain  wrote:

> 
> On Oct 11, 2013, at 4:07 AM, George Bosilca  wrote:
> 
>> 
>> On Oct 9, 2013, at 15:29 , Ralph Castain  wrote:
>> 
>>> IIRC, the concern was with where the thread safety should reside. Some 
>>> classes (e.g., opal_list) were littered with thread locks for every 
>>> operation. So if someone implemented thread protection at a higher level 
>>> (e.g., protecting the list while cycling thru it), then all these 
>>> lower-level lock/unlock operations were just a waste of cycles.
>> 
>> I tried to find these protections in the basic objects (pal_list_t as you 
>> named it) but I failed. I don't see this being the case in any of the 
>> versions out there (1.6, 1.7 nor trunk). There are some atomic operations to 
>> keep track of the ref counts, but this is a completely different topic.
>> 
>> In the OMPI layer we tried to follow the rule that all calls without 
>> capitals are not thread safe (and are functions), while all calls with 
>> capitals at macros and are protected. This was a best effort applied where 
>> it made sense.
> 
> Only one I could find that has been renamed is ompi_free_list_resize, which 
> has been renamed to ompi_free_list_resize_mt as it includes a lock/unlock in 
> it. However, there are many places in the opal and ompi classes where thread 
> locks are being called - this is what we seek to remove.
> 
> opal/class/opal_pointer_array.c:45:OBJ_CONSTRUCT(&array->lock, 
> opal_mutex_t);
> opal/class/opal_pointer_array.c:67:OBJ_DESTRUCT(&array->lock);
> opal/class/opal_pointer_array.c:113:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:120:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:149:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:171:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:175:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:215:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:248:OPAL_THREAD_LOCK(&(table->lock));
> opal/class/opal_pointer_array.c:251:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:260:
> OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:291:OPAL_THREAD_UNLOCK(&(table->lock));
> opal/class/opal_pointer_array.c:297:OPAL_THREAD_LOCK(&(array->lock));
> opal/class/opal_pointer_array.c:300:
> OPAL_THREAD_UNLOCK(&(array->lock));
> opal/class/opal_pointer_array.c:304:OPAL_THREAD_UNLOCK(&(array->lock));
>> 
>>> However, some people felt that there were places where it helped to have 
>>> the locking down below. So this was the compromise - use the version that 
>>> fits your situation.
>> 
>> In most of the cases there is nothing better we can do down than protecting 
>> the call itself. 
>> 
>>> Personally, I'm not wild about it, but I can live with it. I'd prefer to 
>>> see no lock/unlock calls in the classes themselves as they are too 
>>> atomistic, and would have opted for providing a macro version of the 
>>> function that included the appropriate lock/unlocks around the function.
>> 
>> I'm 100% with you here, I also prefer to see the locks, as this makes errors 
>> easier to spot. This is why I'm concerned about moving them outside the 
>> view, buried under several levels of macro indirections. I could understand 
>> the push if there was an obvious performance or safety benefit, but as I 
>> fail to see I was wondering if I was missing something from the "bigger" 
>> picture.
> 
> Here's how I recollect the discussion. There are thread locks down in many of 
> the opal classes
> - the opal_pointer_array and opal_list functions have embedded lock/unlock in 
> their operations, and I believe others do too.

There are only 3 classes that have locks: pointer array, freelist and ring 
buffer. The opal_list has nothing to do with threads, there are no protections.

> We talked about our desired threading model and agreed that this was too low 
> down in the stack. For example, looping over an opal_list shouldn't invoke a 
> thread lock/unlock for every opal_list_get_next call - we can just lock the 
> loop and avoid all the performance hit. So we agreed on a higher-level thread 
> protection model where we lock up above where the calls are being made.

Thing that can be reached for all existing classes by calling the version 
without capitals. There is one exception, the pointer array which was one of 
these classes with a double history (one in ORTE and one in OMPI). The OMPI 
version needed protection as we use it to make the translation between C and 
Fortran …

> However, someone pointed out that there might be times when locking at the 
> lower level made sense. So we agreed that any function that actually might 
> benefit from internal thread protection would have two variants: _mt that had 
> the locks, and _st that did not. I t

[OMPI devel] 1.7.3: fixed missing Fortran CMR

2013-10-15 Thread Jeff Squyres (jsquyres)
Per the teleconf today: there was a missing Fortran CMR that was causing a 
linker error on the v1.7 branch.  That CMR has now been committed.  In my 
testing, all is now working properly.  ...but I'd like to let it soak through 
the nightly MTT and see what it looks like tomorrow.

Ralph and I chatted about this on the phone, and he's cool with this plan.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/