Did a little digging into this last night, and finally figured out what you were getting at in your comments here. Yeah, I think an "affinity" framework would definitely be the best approach - can handle both cpu and memory, I imagine. Isn't clear how pressing that is as it is mostly an optimization issue, but you're welcome to create the framework if you like.
On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote: > It needs to be done in the launched process itself. So we'd either > have to extend rmaps (from my understanding of rmaps, that doesn't seem > like a good idea), or do something different. > > Perhaps the easiest thing to do is to add this to the LANL meeting > agenda...? Then we can have a whiteboard to discuss. :-) > > > > On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote: > > > Wouldn't it belong in the rmaps framework? That's where we tell the > > launcher where to put each process - seems like a natural fit. > > > > > > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote: > > > >> I'm thinking that we should add some processor affinity code to OMPI > >> -- > >> possibly in the orte layer (ORTE is the interface to the back-end > >> launcher, after all). This will really help on systems like opterons > >> (and others) to prevent processes from bouncing between processors, > >> and > >> potentially getting located far from "their" RAM. > >> > >> This has the potential to help even micro-benchmark results (e.g., > >> ping-pong). It's going to be quite relevant for my shared memory > >> collective work on mauve. > >> > >> > >> General scheme: > >> --------------- > >> > >> I think that somewhere in ORTE, we should actively set processor > >> affinity when: > >> - Supported by the OS > >> - Not disabled by the user (via MCA param) > >> - The node is not over-subscribed with processes from this job > >> > >> Generally speaking, if you launch <=N processes in a job on a node > >> (where N == number of CPUs on that node), then we set processor > >> affinity. We set each process's affinity to the CPU number according > >> to the VPID ordering of the procs in that job on that node. So if you > >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would > >> go to processor 1, etc. (it's an easy, locally-determined ordering). > >> > >> Someday, we might want to make this scheme universe-aware (i.e., see > >> if > >> any other ORTE jobs are running on that node, and not schedule on any > >> processors that are already claimed by the processes on that(those) > >> job(s)), but I think single-job awareness is sufficient for the > >> moment. > >> > >> > >> Implementation: > >> --------------- > >> > >> We'll need relevant configure tests to figure out if the target system > >> as CPU affinity system calls. Those are simple to add. > >> > >> We could use simply #if statements for the affinity stuff or make it a > >> real framework. Since it's only 1 function call to set the affinity, > >> I > >> tend to lean towards the [simpler] #if solution, but could probably be > >> pretty easily convinced that a framework is the Right solution. I'm > >> on > >> the fence (and if someone convinces me, I'd volunteer for the extra > >> work to setup the framework). > >> > >> I'm not super-familiar with the processor-affinity stuff (e.g., for > >> best effect, should it be done after the fork and before the exec?), > >> so > >> I'm not sure exactly where this would go in ORTE. Potentially either > >> before new processes are exec'd (where we only have control of that in > >> some kinds of systems, like rsh/ssh) or right up very very near the > >> top > >> of orte_init(). > >> > >> Comments? > >> > >> -- > >> {+} Jeff Squyres > >> {+} The Open MPI Project > >> {+} http://www.open-mpi.org/ > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >