On Nov 14, 2013, at 1:16 PM, Shamis, Pavel <sham...@ornl.gov> wrote:

>> 1. Ralph made the OOB asynchronous.


I pondered this for awhile today, and I just want to correct any misimpression 
this statement might leave, especially with folks who haven't been around the 
project that much over the last couple of years. Just to clarify: this wasn't a 
case of Ralph waking up one day and saying "hey, let's make the OOB async!". 
Quite the contrary.

This whole conversion process started nearly two years ago when we, as a 
community, decided to move towards an async progress model. We laid out all the 
things that we thought would need to be done to make that happen...and then we 
started down that path. First, we updated the event library to the 2.x series 
so we could separate the event bases for the different layers, and so we could 
have event priority levels. Some folks started hardening the BTLs for thread 
safety and adding progress threads inside them. Etc.

One step on that path was to make ORTE operate asynchronously as a purely 
event-driven library. First, we rewrote the state machine so all ORTE 
operations ran in an event, except for the OOB as that can of worms was just 
too hard. Frankly, nobody wanted to touch it, so we left it alone and made 
everything else work.

Finally, I took on the OOB rewrite. One of our continual problems was 
deadlocking somewhere because someone would call a blocking send/recv while in 
an OOB callback - usually way down in the stack somewhere that wasn't 
immediately obvious to the user. After spending time fiddling with things, it 
became clear that the only simple solution was to make the OOB totally 
non-blocking. This also made a much cleaner integration to the rest of the ORTE 
state machine.

So we brought it up at a couple of developer meetings, talked a number of times 
on the weekly telecon, went thru several email threads, RFCs, etc. - with me 
emphasizing repeatedly that the OOB was going to lose its blocking interfaces. 
The fact that OOB callbacks would be occurring in the ORTE event base thread 
was also discussed, and was one of the reasons why we locked libevent thread 
protection "on" earlier this year. This fact may have escaped some people, but 
it was discussed on several occasions.

The proof of the pudding is that all of the MPI layer has been adapted to the 
new async behavior -except- for the openib cpc's. The issue of what to do with 
these has been raised several times, especially once the ofacm code was 
committed. Unfortunately, lack of time and priorities left this code to bitrot.

I'm not pointing fingers at anyone, nor am I saying this was all perfect. Just 
trying to point out that this was a community move that is part of our 
community roadmap, and we perhaps need to be better at finding a way to keep 
everyone/everything a little more connected to the convoy. This is going to get 
even more rocky in the next year as we push towards full thread safety and 
async progress, and re-implement checkpoint/restart support.

So heads-up...!
Ralph

Reply via email to