On Mon, 2009-06-08 at 17:50 +0200, Sylvain Jeaugey wrote: > Principle > ========= > > opal_progress() ensures the progression of MPI communication. The current > algorithm is a loop calling progress on all registered components. If the > program is blocked, the loop will busy-poll indefinetely.
I have some experience here due to implementing this feature (blocking waits) on Quadrics hardware. You're right that it can have benefits and yielding the CPU when "idle" is a good thing in the general case. The "correct" way for a process to relinquish the cpu is to block in a select() or poll() call until data is received whereupon it can wake up and continue working, the major problem each and every MPI implementation has is that select() only works for tcp/ip and not for shared memory or any of the more exotic networks. IMHO it would be much preferred to solve this problem properly and block in the wakeable select() rather than usleep(). In my experience when done correctly the performance is affected however surprisingly it can often lead to increased performance, we had full coverage however so were able to sleep early and wake up in a timely manner on receiving any message. Yeilding even one cpu per node from the application occasionally gives any background/os processing a chance to run without impacting the performance of the application so enabling blocking waits can lead to quicker runtimes. > Going to sleep after a certain amount of time with nothing received is > interesting for two things : > > - Administrator can easily detect whether a job is deadlocked : all the > processes are in sleep(). Currently, all processors are using 100% cpu and > it is very hard to know if progression is still happening or not. This is a valuable thing to know however I don't view the proposed solution as the correct one, if this were the problem you were aiming to solve I'd recommend a different approach, more like the llnl solution that Ralph described. Yours, Ashley Pittman. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk