Greetings all. I'm writing this to ask for help from the general development community. We've run into a problem with Linux processor affinity, and although I've individually talked to a lot of people about this, no one has been able to come up with a solution. So I thought I'd open this to a wider audience.

This is a long-ish e-mail; bear with me.

As you may or may not know, Open MPI includes support for processor and memory affinity. There are a number of benefits, but I'll skip that discussion for now. For more information, see the following:

http://www.open-mpi.org/faq/?category=building#build-paffinity
http://www.open-mpi.org/faq/?category=building#build-maffinity
http://www.open-mpi.org/faq/?category=tuning#paffinity-defs
http://www.open-mpi.org/faq/?category=tuning#maffinity-defs
http://www.open-mpi.org/faq/?category=tuning#using-paffinity

Here's the problem: there are 3 different APIs for processor affinity in Linux. I have not done exhaustive research on this, but which API you have seems to depend on your version of kernel, glibc, and/or Linux vendor (i.e., some vendors appear to port different versions of the API to their particular kernel/glibc). The issue is that all 3 versions of the API use the same function names (sched_setaffinity() and sched_getaffinity()), but they change the number and types of the parameters to these functions.

This is not a big problem for source distributions of Open MPI -- our configure script figures out which one you have and uses preprocessor directives to select the Right stuff in our code base for your platform.

What *is* a big problem, however, is that ISVs can therefore not ship a binary Open MPI installation and reasonably expect the processor affinity aspects of it to work on multiple Linux platforms. That is, if the ISV compiles for API #X and ships a binary to a system that has API #Y, there are two options:

1. Processor affinity is disabled. This means that the benefits of processor affinity won't be visible (not hugely important on 2-way SMPs, but as the number of processors/cores increases, this is going to become more important), and Open MPI's NUMA-aware collectives won't be able to be used (because memory affinity may not be useful without processor affinity guarantees).

2. Processor affinity is enabled, but the code invokes API #X on a system with API #Y. This will have unpredictable results, the best case of which will be that processor affinity is simply [effectively] ignored; the worst case of which will be that the application will fail (e.g., seg fault).

Clearly, neither of these solutions are attractive.

My question to the developer crowd out there -- can you think of a way around this? More specifically, is there a way to know -- at run time -- which API to use? We can do some compiler trickery to compile all three APIs into a single Open MPI installation and then run-time dispatch to the Right one, but this is contingent upon being able to determine which API to dispatch to. A bunch of us have poked around and not found anything on the system that indicates which API you have (e.g., looked in /proc and /sys), but not found anything.

Does anyone have any suggestions here?

Many thanks for your time.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to