Re: [OMPI devel] PML selection logic

Aurélien Bouteiller Mon, 23 Jun 2008 13:59:39 -0400

The first approach sounds fair enough to me. We should avoid 2 and 3as the pml selection mechanism used to bemore complex before we reduced it to accommodate a major design bug inthe BTL selection process. When using the complete PML selection, BTLwould be initialized several times, leading to a variety of bugs.Eventually the PML selection should return to its old self, when theBTL bug gets fixed.


Aurelien


Le 23 juin 08 à 12:36, Ralph H Castain a écrit :

Yo all
I've been doing further research into the modex and came acrosssomething Idon't fully understand. It seems we have each process insert intothe modexthe name of the PML module that it selected. Once the modex hasexchanged
that info, it then loops across all procs in the job to check their
selection, and aborts if any proc picked a different PML module.
All well and good...assuming that procs actually -can- choosedifferent PMLmodules and hence create an "abort" scenario. However, if I lookinside thePML's at their selection logic, I find that a proc can ONLY pick amodule
other than ob1 if:

1. the user specifies the module to use via -mca pml xyz or by using a
module specific mca param to adjust its priority. In this case,since themca param is propagated, ALL procs have no choice but to pick thatsamemodule, so that can't cause us to abort (we will have alreadyreturned an
error and aborted if the specified module can't run).
2. the pml/cm module detects that an MTL module was selected, andthat it isother than "psm". In this case, the CM module will be selectedbecause its
default priority is higher than that of OB1.
In looking deeper into the MTL selection logic, it appears to methat youeither have the required capability or you don't. I can see that insomeenvironments (e.g., rsh across unmanaged collections of machines),it mightbe possible for someone to launch across a set of machines wheresome do andsome don't have the required support. However, in all other cases,this will
be homogeneous across the system.
Given this analysis (and someone more familiar with the PML shouldfeel freeto confirm or correct it), it seems to me that this could bestreamlined via
one or more means:
1. at the most, we could have rank=0 add the PML module name to themodex,and other procs simply check it against their own and return anerror ifthey differ. This accomplishes the identical functionality to whatwe have
today, but with much less info in the modex.
2. we could eliminate this info from the modex altogether byrequiring theuser to specify the PML module if they want something other than thedefaultOB1. In this case, there can be no confusion over what each proc isto use.The CM module will attempt to init the MTL - if it cannot do so,then thejob will return the correct error and tell the user that CM/MTLsupport is
unavailable.
3. we could again eliminate the info by not inserting it into themodex if(a) the default PML module is selected, or (b) the user specifiedthe PMLmodule to be used. In the first case, each proc can simply check tosee ifthey picked the default - if not, then we can insert the info toindicatethe difference. Thus, in the "standard" case, no info will beinserted.
In the second case, we will already get an error if the specifiedPML modulecould not be used. Hence, the modex check provides no additionalinfo or
value.
I understand the motivation to support automation. However, in thiscase,
the automation actually doesn't seem to buy us very much, and it isn't
coming "free". So perhaps some change in how this is done would bein order?
Ralph



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] PML selection logic

Reply via email to