Re: [OMPI devel] Multi-Rail and Open IB BTL

Don Kerr Wed, 14 Nov 2007 16:16:26 -0500


Jeff Squyres wrote:

On Nov 9, 2007, at 1:24 PM, Don Kerr wrote:
both, I was thinking of listing what I think are multi-railrequirements
but wanted to understand what the current state of things are
I believe the OF portion of the FAQ describes what we do in the v1.2series (right Gleb?); I honestly don't remember what we do today onthe trunk (I'm pretty sure that Gleb has tweaked it recently).

Gleb's response answered this.

As for what we *should* do, it's a very complicated question.  :-\

OK. I knew the "close to NIC" was a concern but was not aware an attemptto tackle this began. I will look at the "carto" framework.


Thanks
-DON

This is where all these discussions regarding affinity, NUMA, and NUNA(non uniform network architecture) come into play. A "very simple"scenario may be something like this:
- host A is UMA (perhaps even a uniprocessor) with 2 ports that areequidistant from the 1 MPI process on that host- host B is the same, except it only has 1 active port on the same IBsubnet as host A's 2 ports
- the ports on both hosts are all the same speed (e.g., DDR)
- the ports all share a single, common, non-blocking switch
But even with this "simple" case, the answer as to what you should dois still unclear. If host A is able to drive both of its DDR links atfull speed, you're could cause congestion at the link to host B if theMPI process on host A opens two connections. But if host A is onlyable to drive the same effective bandwidth out of its two ports as itis through a single port, then the end effect is probably fairlynegligible -- it might not make much of a difference at all as towhether the MPI process A opens 1 or 2 connections to host B.
But then throw in other effects that I mentioned above (NUMA, NUNA,etc.), and the equation becomes much more complex. In some cases, itmay be good to open 1 connection (e.g., bandwidth load balancing); inother cases it may be good to open 2 (e.g., congestion avoidance /spreading traffic around the network, particularly in the presence ofother MPI jobs on the network). :-\
Such NUNA architectures may sound unusual to some, but both IBM and HPsell [many] blade-based HPC solutions with NUNA internal IB networks.Specifically: this is a fairly common scenario.
So this is a difficult question without a great answer. The hope isthat the new carto framework that Sharon sent requirements around forwill be able to at least make topology information available from boththe host and the network so that BTLs can possibly make someintelligent decisions about what to do in these kinds of scenarios.

Re: [OMPI devel] Multi-Rail and Open IB BTL

Reply via email to