Some very good points in this thread all round. On Mon, 2009-08-17 at 09:00 -0400, Jeff Squyres wrote: > > This is probably not too surprising (i.e., allowing the OS to move > jobs around between cores on a socket can probably involve a little > cache thrashing, resulting in that 5-10% loss). I'm hand-waving > here, > and I have not tried this myself, but it's not too surprising of a > result to me. It's also not too surprising that others don't see > this > effect at all (e.g., Sun didn't see any difference in bind-to-core > vs. > bind-to-socket) when they ran their tests. YMMV. > > I'd actually be in favor of a by-core binding (not by-socket), but > spreading the processes out round robin by socket, not by core. All > of this would be the *default* behavior, of course -- command line > params/MCA params will be provided to change to whatever pattern is > desired.
I'm in favour of by-core binding, if it's done correctly I've seen results that tie in with Ralphs 5-10% figure. If it's done incorrectly however it can be atrocious, the kernel scheduler may not be perfect but at least it's never bad. One (small) point nobody has mentioned yet is that when using round-robin core binding some applications prefer you to round robin by-socket and some prefer you to round-robin by-core. This will depend on their level of comms and any cache-sharing benefits. Perhaps this is the reason Ralph saw improvements but Sun didn't? Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk