Good catch. I fixed the TCP BTL (r31753). It is the only BTL I can test so that's the most I can do here.
However, I never get OPAL_ERR_DATA_VALUE_NOT_FOUND out of the modex call when the key doesn't exists. I looked in dstore and the correct value one should look for is OPAL_ERR_NOT_FOUND. I guess you might want to revise the check in the USNIC. George. PS: There is a easy way to test this particular case by using the MPMD capabilities of mpiexec. As an example for a quick NetPIPE run between two processes one supporting SM and TCP and one supporting only SM (I ignored self here), you can do: mpirun -np 1 --mca btl tcp,sm,self ./NPmpi -l 5 -u 5 : -np 1 --mca btl sm,self ./NPmpi -l 5 -u 5 On Tue, May 13, 2014 at 2:09 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > I notice that BTLs are not checking the return value from ompi_modex_recv() > for OPAL_ERR_DATA_VALUE_NOT_FOUND (indicating that the peer process didn't > put that modex key). In the BTL context, NOT_FOUND means that that peer > process doesn't have this BTL, so this local peer process should probably > mark it as unreachable in add_procs(). > > This is on both trunk and the v1.8 branch. > > The BTLs listed above are not checking/handling ompi_modex_recv() returning > OPAL_ERR_DATA_VALUE_NOT_FOUND properly. Most of these BTLs do something like > this: > > ----- > module_add_procs() { > loop over the peers { > proc = proc_create(...) > if (NULL == proc) > error! > .... > } > } > > proc_create(...) { > if (ompi_modex_recv() != OMPI_SUCCESS) > return NULL; > ... > } > ----- > > The fix is to make proc_create() return something a bit more expressive so > that add_procs() can tell the difference between "error!" and "you can't > reach this peer". > > I fixed this in the usnic BTL back in late March, but forgot to bring this to > everyone's attention -- oops. See > https://svn.open-mpi.org/trac/ompi/ticket/4442 > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14783.php