You're right, the code was overzealous. I fix it by removing the parsing of the modex data completely. In any case, the collective module has another chance of deselecting itself, upon creation of a new communicator (thus, after the modex was completed).
George On Jul 6, 2012, at 2:20, Ralph Castain <rhc.open...@gmail.com> wrote: > George: is there any reason for opening and selecting the coll framework so > early in mpi_init? I'm wondering if we can move that code to the end of the > procedure so we wouldn't need the locality info until later. > > Sent from my iPad > > On Jul 5, 2012, at 10:05 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > >> Thanks George. I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about >> this. >> >> >> On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote: >> >>> Thanks all of you for your time and early responses. >>> >>> After applying the patch, SM can be used by raising its priority. It is >>> enough for me (I hope so). But it continues failing when I specify --mca >>> coll sm,self in the command line (with tuned too). >>> I am not going to use this release in production, only for playing with the >>> code :-) >>> >>> Regards, >>> Juan Antonio. >>> >>> El 04/07/2012, a las 02:59, George Bosilca escribió: >>> >>>> Juan, >>>> >>>> Something weird is going on there. The selection mechanism for the SM coll >>>> and SM BTL should be very similar. However, the SM BTL successfully select >>>> itself while the SM coll fails to determine that all processes are local. >>>> >>>> In the coll SM the issue is that the remote procs do not have the LOCAL >>>> flag set, even when they are on the local node (however the >>>> ompi_proc_local() return has a special flag stating that all processes in >>>> the job are local). I compared the initialization of the SM BTL and the SM >>>> coll. It turns out that somehow the procs returned by ompi_proc_all() and >>>> the procs provided to the add_proc of the BTLs are not identical. The >>>> second have the local flag correctly set, so I went a little bit deeper. >>>> >>>> Here is what I found while toying with gdb inside: >>>> >>>> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, >>>> enable_mpi_threads=false) at coll_sm_module.c:132 >>>> >>>> (gdb) p procs[0] >>>> $1 = (ompi_proc_t *) 0x109a1e8c0 >>>> (gdb) p procs[1] >>>> $2 = (ompi_proc_t *) 0x109a1e970 >>>> (gdb) p procs[0]->proc_flags >>>> $3 = 0 >>>> (gdb) p procs[1]->proc_flags >>>> $4 = 4095 >>>> >>>> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, >>>> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at >>>> btl_sm.c:427 >>>> >>>> (gdb) p procs[0] >>>> $5 = (struct ompi_proc_t *) 0x109a1e8c0 >>>> (gdb) p procs[1] >>>> $6 = (struct ompi_proc_t *) 0x109a1e970 >>>> (gdb) p procs[0]->proc_flags >>>> $7 = 1920 >>>> (gdb) p procs[1]->proc_flags >>>> $8 = 4095 >>>> >>>> Thus the problem seems to come from the fact that during the >>>> initialization of the SM coll the flags are not correctly set. However, >>>> this is somehow expected … as the call to the initialization happens >>>> before the exchange of the business cards (and therefore there is no way >>>> to have any knowledge about the remote procs). >>>> >>>> So, either something changed drastically in the way we set the flags for >>>> remote processes or we did not use the SM coll for the last 3 years. I >>>> think the culprit is r21967 >>>> (https://svn.open-mpi.org/trac/ompi/changeset/21967) who added a >>>> "selection" logic based on knowledge about remote procs in the coll SM >>>> initialization function. But this selection logic was way to early !!! >>>> >>>> I would strongly encourage you not to use this SM collective component in >>>> anything related to production runs. >>>> >>>> george. >>>> >>>> PS: However, if you want to toy with the SM coll apply the following patch: >>>> Index: coll_sm_module.c >>>> =================================================================== >>>> --- coll_sm_module.c (revision 26737) >>>> +++ coll_sm_module.c (working copy) >>>> @@ -128,6 +128,7 @@ >>>> int mca_coll_sm_init_query(bool enable_progress_threads, >>>> bool enable_mpi_threads) >>>> { >>>> +#if 0 >>>> ompi_proc_t *my_proc, **procs; >>>> size_t i, size; >>>> >>>> @@ -158,7 +159,7 @@ >>>> "coll:sm:init_query: no other local procs; >>>> disqualifying myself"); >>>> return OMPI_ERR_NOT_AVAILABLE; >>>> } >>>> - >>>> +#endif >>>> /* Don't do much here because we don't really want to allocate any >>>> shared memory until this component is selected to be used. */ >>>> opal_output_verbose(10, mca_coll_base_output, >>>> >>>> >>>> >>>> >>>> >>>> On Jul 4, 2012, at 02:05 , Ralph Castain wrote: >>>> >>>>> Okay, please try this again with r26739 or above. You can remove the rest >>>>> of the "verbose" settings and the --display-map so we declutter the >>>>> output. Please add "-mca orte_nidmap_verbose 20" to your cmd line. >>>>> >>>>> Thanks! >>>>> Ralph >>>>> >>>>> >>>>> On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico <jar...@unex.es> wrote: >>>>> Here is the output. >>>>> >>>>> [jarico@Metropolis-01 examples]$ >>>>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core >>>>> --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca >>>>> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca >>>>> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n >>>>> 2 -mca grpcomm_base_verbose 5 ./bmem >>>>> [Metropolis-01:24563] mca: base: components_open: Looking for hwloc >>>>> components >>>>> [Metropolis-01:24563] mca: base: components_open: opening hwloc components >>>>> [Metropolis-01:24563] mca: base: components_open: found loaded component >>>>> hwloc142 >>>>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has >>>>> no register function >>>>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has >>>>> no open function >>>>> [Metropolis-01:24563] hwloc:base:get_topology >>>>> [Metropolis-01:24563] hwloc:base: no cpus specified - using root >>>>> available cpuset >>>>> [Metropolis-01:24563] mca:base:select:(grpcomm) Querying component [bad] >>>>> [Metropolis-01:24563] mca:base:select:(grpcomm) Query of component [bad] >>>>> set priority to 10 >>>>> [Metropolis-01:24563] mca:base:select:(grpcomm) Selected component [bad] >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:receive start comm >>>>> -------------------------------------------------------------------------- >>>>> WARNING: a request was made to bind a process. While the system >>>>> supports binding the process itself, at least one node does NOT >>>>> support binding memory to the process location. >>>>> >>>>> Node: Metropolis-01 >>>>> >>>>> This is a warning only; your job will continue, though performance may >>>>> be degraded. >>>>> -------------------------------------------------------------------------- >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base:get_nbojbs computed data 8 of Core:0 >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24563] hwloc:base: get available cpus >>>>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>>>> >>>>> ======================== JOB MAP ======================== >>>>> >>>>> Data for node: Metropolis-01 Num procs: 2 >>>>> Process OMPI jobid: [36265,1] App: 0 Process rank: 0 >>>>> Process OMPI jobid: [36265,1] App: 0 Process rank: 1 >>>>> >>>>> ============================================================= >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job >>>>> [36265,0] tag 1 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:xcast updating daemon >>>>> nidmap >>>>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient >>>>> list is empty! >>>>> [Metropolis-01:24564] mca: base: components_open: Looking for hwloc >>>>> components >>>>> [Metropolis-01:24564] mca: base: components_open: opening hwloc components >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> hwloc142 >>>>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has >>>>> no register function >>>>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has >>>>> no open function >>>>> [Metropolis-01:24565] mca: base: components_open: Looking for hwloc >>>>> components >>>>> [Metropolis-01:24565] mca: base: components_open: opening hwloc components >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> hwloc142 >>>>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has >>>>> no register function >>>>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has >>>>> no open function >>>>> [Metropolis-01:24564] mca:base:select:(grpcomm) Querying component [bad] >>>>> [Metropolis-01:24564] mca:base:select:(grpcomm) Query of component [bad] >>>>> set priority to 10 >>>>> [Metropolis-01:24564] mca:base:select:(grpcomm) Selected component [bad] >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive start comm >>>>> [Metropolis-01:24564] computing locality - getting object at level CORE, >>>>> index 0 >>>>> [Metropolis-01:24564] hwloc:base: get available cpus >>>>> [Metropolis-01:24564] hwloc:base:get_available_cpus first time - >>>>> filtering cpus >>>>> [Metropolis-01:24564] hwloc:base: no cpus specified - using root >>>>> available cpuset >>>>> [Metropolis-01:24564] computing locality - getting object at level CORE, >>>>> index 1 >>>>> [Metropolis-01:24564] hwloc:base: get available cpus >>>>> [Metropolis-01:24564] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24564] computing locality - shifting up from L1CACHE >>>>> [Metropolis-01:24564] computing locality - shifting up from L2CACHE >>>>> [Metropolis-01:24564] computing locality - shifting up from L3CACHE >>>>> [Metropolis-01:24564] computing locality - filling level SOCKET >>>>> [Metropolis-01:24564] computing locality - filling level NUMA >>>>> [Metropolis-01:24564] locality: CL:CU:N:B:Nu:S >>>>> [Metropolis-01:24565] mca:base:select:(grpcomm) Querying component [bad] >>>>> [Metropolis-01:24565] mca:base:select:(grpcomm) Query of component [bad] >>>>> set priority to 10 >>>>> [Metropolis-01:24565] mca:base:select:(grpcomm) Selected component [bad] >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive start comm >>>>> [Metropolis-01:24564] mca: base: components_open: Looking for coll >>>>> components >>>>> [Metropolis-01:24564] mca: base: components_open: opening coll components >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> tuned >>>>> [Metropolis-01:24564] mca: base: components_open: component tuned has no >>>>> register function >>>>> [Metropolis-01:24564] coll:tuned:component_open: done! >>>>> [Metropolis-01:24564] mca: base: components_open: component tuned open >>>>> function successful >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> sm >>>>> [Metropolis-01:24564] mca: base: components_open: component sm register >>>>> function successful >>>>> [Metropolis-01:24564] mca: base: components_open: component sm has no >>>>> open function >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> libnbc >>>>> [Metropolis-01:24564] mca: base: components_open: component libnbc >>>>> register function successful >>>>> [Metropolis-01:24564] mca: base: components_open: component libnbc open >>>>> function successful >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> hierarch >>>>> [Metropolis-01:24564] mca: base: components_open: component hierarch has >>>>> no register function >>>>> [Metropolis-01:24564] mca: base: components_open: component hierarch open >>>>> function successful >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> basic >>>>> [Metropolis-01:24564] mca: base: components_open: component basic >>>>> register function successful >>>>> [Metropolis-01:24564] mca: base: components_open: component basic has no >>>>> open function >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> inter >>>>> [Metropolis-01:24564] mca: base: components_open: component inter has no >>>>> register function >>>>> [Metropolis-01:24564] mca: base: components_open: component inter open >>>>> function successful >>>>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>>>> self >>>>> [Metropolis-01:24564] mca: base: components_open: component self has no >>>>> register function >>>>> [Metropolis-01:24564] mca: base: components_open: component self open >>>>> function successful >>>>> [Metropolis-01:24565] computing locality - getting object at level CORE, >>>>> index 1 >>>>> [Metropolis-01:24565] hwloc:base: get available cpus >>>>> [Metropolis-01:24565] hwloc:base:get_available_cpus first time - >>>>> filtering cpus >>>>> [Metropolis-01:24565] hwloc:base: no cpus specified - using root >>>>> available cpuset >>>>> [Metropolis-01:24565] hwloc:base: get available cpus >>>>> [Metropolis-01:24565] hwloc:base:filter_cpus specified - already done >>>>> [Metropolis-01:24565] computing locality - getting object at level CORE, >>>>> index 0 >>>>> [Metropolis-01:24565] computing locality - shifting up from L1CACHE >>>>> [Metropolis-01:24565] computing locality - shifting up from L2CACHE >>>>> [Metropolis-01:24565] computing locality - shifting up from L3CACHE >>>>> [Metropolis-01:24565] computing locality - filling level SOCKET >>>>> [Metropolis-01:24565] computing locality - filling level NUMA >>>>> [Metropolis-01:24565] locality: CL:CU:N:B:Nu:S >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0 >>>>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>>>> PARTICIPANTS >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: performing modex >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:pack_modex: reporting 4 >>>>> entries >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:full:modex: executing >>>>> allgather >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering allgather >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad allgather underway >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: modex posted >>>>> [Metropolis-01:24565] mca: base: components_open: Looking for coll >>>>> components >>>>> [Metropolis-01:24565] mca: base: components_open: opening coll components >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> tuned >>>>> [Metropolis-01:24565] mca: base: components_open: component tuned has no >>>>> register function >>>>> [Metropolis-01:24565] coll:tuned:component_open: done! >>>>> [Metropolis-01:24565] mca: base: components_open: component tuned open >>>>> function successful >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> sm >>>>> [Metropolis-01:24565] mca: base: components_open: component sm register >>>>> function successful >>>>> [Metropolis-01:24565] mca: base: components_open: component sm has no >>>>> open function >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> libnbc >>>>> [Metropolis-01:24565] mca: base: components_open: component libnbc >>>>> register function successful >>>>> [Metropolis-01:24565] mca: base: components_open: component libnbc open >>>>> function successful >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> hierarch >>>>> [Metropolis-01:24565] mca: base: components_open: component hierarch has >>>>> no register function >>>>> [Metropolis-01:24565] mca: base: components_open: component hierarch open >>>>> function successful >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> basic >>>>> [Metropolis-01:24565] mca: base: components_open: component basic >>>>> register function successful >>>>> [Metropolis-01:24565] mca: base: components_open: component basic has no >>>>> open function >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> inter >>>>> [Metropolis-01:24565] mca: base: components_open: component inter has no >>>>> register function >>>>> [Metropolis-01:24565] mca: base: components_open: component inter open >>>>> function successful >>>>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>>>> self >>>>> [Metropolis-01:24565] mca: base: components_open: component self has no >>>>> register function >>>>> [Metropolis-01:24565] mca: base: components_open: component self open >>>>> function successful >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 0 LOCALLY COMPLETE - >>>>> SENDING TO GLOBAL COLLECTIVE >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>>>> collective recvd from [[36265,0],0] >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>>>> COLLECTIVE 0 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM >>>>> CONTRIBS: 2 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job >>>>> [36265,1] tag 30 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>>>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient >>>>> list is empty! >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: performing modex >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:pack_modex: reporting 4 >>>>> entries >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:full:modex: executing >>>>> allgather >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering allgather >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad allgather underway >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: modex posted >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>>>> collective return for id 0 >>>>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 0 >>>>> [Metropolis-01:24564] [[36265,1],0] STORING MODEX DATA >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex >>>>> entry for proc [[36265,1],0] >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>>>> collective return for id 0 >>>>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 0 >>>>> [Metropolis-01:24565] [[36265,1],1] STORING MODEX DATA >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex >>>>> entry for proc [[36265,1],0] >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries: >>>>> adding 4 entries for proc [[36265,1],0] >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex >>>>> entry for proc [[36265,1],1] >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries: >>>>> adding 4 entries for proc [[36265,1],1] >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries: >>>>> adding 4 entries for proc [[36265,1],0] >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex >>>>> entry for proc [[36265,1],1] >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries: >>>>> adding 4 entries for proc [[36265,1],1] >>>>> [Metropolis-01:24564] coll:find_available: querying coll component tuned >>>>> [Metropolis-01:24564] coll:find_available: coll component tuned is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component tuned >>>>> [Metropolis-01:24565] coll:find_available: coll component tuned is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component sm >>>>> [Metropolis-01:24564] coll:find_available: querying coll component sm >>>>> [Metropolis-01:24564] coll:sm:init_query: no other local procs; >>>>> disqualifying myself >>>>> [Metropolis-01:24564] coll:find_available: coll component sm is not >>>>> available >>>>> [Metropolis-01:24564] coll:find_available: querying coll component libnbc >>>>> [Metropolis-01:24564] coll:find_available: coll component libnbc is >>>>> available >>>>> [Metropolis-01:24564] coll:find_available: querying coll component >>>>> hierarch >>>>> [Metropolis-01:24564] coll:find_available: coll component hierarch is >>>>> available >>>>> [Metropolis-01:24564] coll:find_available: querying coll component basic >>>>> [Metropolis-01:24564] coll:find_available: coll component basic is >>>>> available >>>>> [Metropolis-01:24565] coll:sm:init_query: no other local procs; >>>>> disqualifying myself >>>>> [Metropolis-01:24565] coll:find_available: coll component sm is not >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component libnbc >>>>> [Metropolis-01:24565] coll:find_available: coll component libnbc is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component >>>>> hierarch >>>>> [Metropolis-01:24565] coll:find_available: coll component hierarch is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component basic >>>>> [Metropolis-01:24565] coll:find_available: coll component basic is >>>>> available >>>>> [Metropolis-01:24564] coll:find_available: querying coll component inter >>>>> [Metropolis-01:24564] coll:find_available: coll component inter is >>>>> available >>>>> [Metropolis-01:24564] coll:find_available: querying coll component self >>>>> [Metropolis-01:24564] coll:find_available: coll component self is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component inter >>>>> [Metropolis-01:24565] coll:find_available: coll component inter is >>>>> available >>>>> [Metropolis-01:24565] coll:find_available: querying coll component self >>>>> [Metropolis-01:24565] coll:find_available: coll component self is >>>>> available >>>>> [Metropolis-01:24565] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>>>> [Metropolis-01:24564] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1 >>>>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>>>> PARTICIPANTS >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 1 LOCALLY COMPLETE - >>>>> SENDING TO GLOBAL COLLECTIVE >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>>>> collective recvd from [[36265,0],0] >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>>>> COLLECTIVE 1 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM >>>>> CONTRIBS: 2 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job >>>>> [36265,1] tag 30 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>>>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient >>>>> list is empty! >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>>>> collective return for id 1 >>>>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 1 >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>>>> collective return for id 1 >>>>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 1 >>>>> [Metropolis-01:24565] coll:base:comm_select: new communicator: >>>>> MPI_COMM_WORLD (cid 0) >>>>> [Metropolis-01:24565] coll:base:comm_select: Checking all available >>>>> modules >>>>> [Metropolis-01:24565] coll:tuned:module_tuned query called >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: tuned, >>>>> priority: 30 >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc, >>>>> priority: 10 >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>>>> hierarch >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: basic, >>>>> priority: 10 >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>>>> inter >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: self >>>>> [Metropolis-01:24565] coll:tuned:module_init called. >>>>> [Metropolis-01:24565] coll:tuned:module_init Tuned is in use >>>>> [Metropolis-01:24565] coll:base:comm_select: new communicator: >>>>> MPI_COMM_SELF (cid 1) >>>>> [Metropolis-01:24565] coll:base:comm_select: Checking all available >>>>> modules >>>>> [Metropolis-01:24564] coll:base:comm_select: new communicator: >>>>> MPI_COMM_WORLD (cid 0) >>>>> [Metropolis-01:24564] coll:base:comm_select: Checking all available >>>>> modules >>>>> [Metropolis-01:24564] coll:tuned:module_tuned query called >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: tuned, >>>>> priority: 30 >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc, >>>>> priority: 10 >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>>>> hierarch >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: basic, >>>>> priority: 10 >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>>>> inter >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: self >>>>> [Metropolis-01:24564] coll:tuned:module_init called. >>>>> [Metropolis-01:24565] coll:tuned:module_tuned query called >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>>>> tuned >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc, >>>>> priority: 10 >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>>>> hierarch >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: basic, >>>>> priority: 10 >>>>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>>>> inter >>>>> [Metropolis-01:24565] coll:base:comm_select: component available: self, >>>>> priority: 75 >>>>> [Metropolis-01:24564] coll:tuned:module_init Tuned is in use >>>>> [Metropolis-01:24564] coll:base:comm_select: new communicator: >>>>> MPI_COMM_SELF (cid 1) >>>>> [Metropolis-01:24564] coll:base:comm_select: Checking all available >>>>> modules >>>>> [Metropolis-01:24564] coll:tuned:module_tuned query called >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>>>> tuned >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc, >>>>> priority: 10 >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>>>> hierarch >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: basic, >>>>> priority: 10 >>>>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>>>> inter >>>>> [Metropolis-01:24564] coll:base:comm_select: component available: self, >>>>> priority: 75 >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>>>> PARTICIPANTS >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>>>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2 >>>>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 2 LOCALLY COMPLETE - >>>>> SENDING TO GLOBAL COLLECTIVE >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>>>> collective recvd from [[36265,0],0] >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>>>> COLLECTIVE 2 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM >>>>> CONTRIBS: 2 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job >>>>> [36265,1] tag 30 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>>>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient >>>>> list is empty! >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>>>> collective return for id 2 >>>>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 2 >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>>>> collective return for id 2 >>>>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 2 >>>>> [Metropolis-01:24565] coll:tuned:component_close: called >>>>> [Metropolis-01:24565] coll:tuned:component_close: done! >>>>> [Metropolis-01:24565] mca: base: close: component tuned closed >>>>> [Metropolis-01:24565] mca: base: close: unloading component tuned >>>>> [Metropolis-01:24565] mca: base: close: component libnbc closed >>>>> [Metropolis-01:24565] mca: base: close: unloading component libnbc >>>>> [Metropolis-01:24565] mca: base: close: unloading component hierarch >>>>> [Metropolis-01:24565] mca: base: close: unloading component basic >>>>> [Metropolis-01:24565] mca: base: close: unloading component inter >>>>> [Metropolis-01:24565] mca: base: close: unloading component self >>>>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive stop comm >>>>> [Metropolis-01:24564] coll:tuned:component_close: called >>>>> [Metropolis-01:24564] coll:tuned:component_close: done! >>>>> [Metropolis-01:24564] mca: base: close: component tuned closed >>>>> [Metropolis-01:24564] mca: base: close: unloading component tuned >>>>> [Metropolis-01:24564] mca: base: close: component libnbc closed >>>>> [Metropolis-01:24564] mca: base: close: unloading component libnbc >>>>> [Metropolis-01:24564] mca: base: close: unloading component hierarch >>>>> [Metropolis-01:24564] mca: base: close: unloading component basic >>>>> [Metropolis-01:24564] mca: base: close: unloading component inter >>>>> [Metropolis-01:24564] mca: base: close: unloading component self >>>>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive stop comm >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job >>>>> [36265,0] tag 1 >>>>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>>>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient >>>>> list is empty! >>>>> [jarico@Metropolis-01 examples]$ >>>>> >>>>> >>>>> >>>>> El 03/07/2012, a las 21:44, Ralph Castain escribió: >>>>> >>>>>> Interesting - yes, coll sm doesn't think they are on the same node for >>>>>> some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why >>>>>> >>>>>> >>>>>> On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote: >>>>>> >>>>>>> The code I run is a simple broadcast. >>>>>>> >>>>>>> When I do not specify components to run, the output is (more verbose): >>>>>>> >>>>>>> [jarico@Metropolis-01 examples]$ >>>>>>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca >>>>>>> mca_base_verbose 100 --mca mca_coll_base_output 100 --mca >>>>>>> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca >>>>>>> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 >>>>>>> -n 2 ./bmem >>>>>>> [Metropolis-01:24490] mca: base: components_open: Looking for hwloc >>>>>>> components >>>>>>> [Metropolis-01:24490] mca: base: components_open: opening hwloc >>>>>>> components >>>>>>> [Metropolis-01:24490] mca: base: components_open: found loaded >>>>>>> component hwloc142 >>>>>>> [Metropolis-01:24490] mca: base: components_open: component hwloc142 >>>>>>> has no register function >>>>>>> [Metropolis-01:24490] mca: base: components_open: component hwloc142 >>>>>>> has no open function >>>>>>> [Metropolis-01:24490] hwloc:base:get_topology >>>>>>> [Metropolis-01:24490] hwloc:base: no cpus specified - using root >>>>>>> available cpuset >>>>>>> >>>>>>> ======================== JOB MAP ======================== >>>>>>> >>>>>>> Data for node: Metropolis-01 Num procs: 2 >>>>>>> Process OMPI jobid: [36336,1] App: 0 Process rank: 0 >>>>>>> Process OMPI jobid: [36336,1] App: 0 Process rank: 1 >>>>>>> >>>>>>> ============================================================= >>>>>>> [Metropolis-01:24491] mca: base: components_open: Looking for hwloc >>>>>>> components >>>>>>> [Metropolis-01:24491] mca: base: components_open: opening hwloc >>>>>>> components >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component hwloc142 >>>>>>> [Metropolis-01:24491] mca: base: components_open: component hwloc142 >>>>>>> has no register function >>>>>>> [Metropolis-01:24491] mca: base: components_open: component hwloc142 >>>>>>> has no open function >>>>>>> [Metropolis-01:24492] mca: base: components_open: Looking for hwloc >>>>>>> components >>>>>>> [Metropolis-01:24492] mca: base: components_open: opening hwloc >>>>>>> components >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component hwloc142 >>>>>>> [Metropolis-01:24492] mca: base: components_open: component hwloc142 >>>>>>> has no register function >>>>>>> [Metropolis-01:24492] mca: base: components_open: component hwloc142 >>>>>>> has no open function >>>>>>> [Metropolis-01:24491] locality: CL:CU:N:B >>>>>>> [Metropolis-01:24491] hwloc:base: get available cpus >>>>>>> [Metropolis-01:24491] hwloc:base:get_available_cpus first time - >>>>>>> filtering cpus >>>>>>> [Metropolis-01:24491] hwloc:base: no cpus specified - using root >>>>>>> available cpuset >>>>>>> [Metropolis-01:24491] hwloc:base:get_available_cpus root object >>>>>>> [Metropolis-01:24491] mca: base: components_open: Looking for coll >>>>>>> components >>>>>>> [Metropolis-01:24491] mca: base: components_open: opening coll >>>>>>> components >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component tuned >>>>>>> [Metropolis-01:24491] mca: base: components_open: component tuned has >>>>>>> no register function >>>>>>> [Metropolis-01:24491] coll:tuned:component_open: done! >>>>>>> [Metropolis-01:24491] mca: base: components_open: component tuned open >>>>>>> function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component sm >>>>>>> [Metropolis-01:24491] mca: base: components_open: component sm register >>>>>>> function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: component sm has no >>>>>>> open function >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component libnbc >>>>>>> [Metropolis-01:24491] mca: base: components_open: component libnbc >>>>>>> register function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: component libnbc open >>>>>>> function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component hierarch >>>>>>> [Metropolis-01:24491] mca: base: components_open: component hierarch >>>>>>> has no register function >>>>>>> [Metropolis-01:24491] mca: base: components_open: component hierarch >>>>>>> open function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component basic >>>>>>> [Metropolis-01:24491] mca: base: components_open: component basic >>>>>>> register function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: component basic has >>>>>>> no open function >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component inter >>>>>>> [Metropolis-01:24491] mca: base: components_open: component inter has >>>>>>> no register function >>>>>>> [Metropolis-01:24491] mca: base: components_open: component inter open >>>>>>> function successful >>>>>>> [Metropolis-01:24491] mca: base: components_open: found loaded >>>>>>> component self >>>>>>> [Metropolis-01:24491] mca: base: components_open: component self has no >>>>>>> register function >>>>>>> [Metropolis-01:24491] mca: base: components_open: component self open >>>>>>> function successful >>>>>>> [Metropolis-01:24492] locality: CL:CU:N:B >>>>>>> [Metropolis-01:24492] hwloc:base: get available cpus >>>>>>> [Metropolis-01:24492] hwloc:base:get_available_cpus first time - >>>>>>> filtering cpus >>>>>>> [Metropolis-01:24492] hwloc:base: no cpus specified - using root >>>>>>> available cpuset >>>>>>> [Metropolis-01:24492] hwloc:base:get_available_cpus root object >>>>>>> [Metropolis-01:24492] mca: base: components_open: Looking for coll >>>>>>> components >>>>>>> [Metropolis-01:24492] mca: base: components_open: opening coll >>>>>>> components >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component tuned >>>>>>> [Metropolis-01:24492] mca: base: components_open: component tuned has >>>>>>> no register function >>>>>>> [Metropolis-01:24492] coll:tuned:component_open: done! >>>>>>> [Metropolis-01:24492] mca: base: components_open: component tuned open >>>>>>> function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component sm >>>>>>> [Metropolis-01:24492] mca: base: components_open: component sm register >>>>>>> function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: component sm has no >>>>>>> open function >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component libnbc >>>>>>> [Metropolis-01:24492] mca: base: components_open: component libnbc >>>>>>> register function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: component libnbc open >>>>>>> function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component hierarch >>>>>>> [Metropolis-01:24492] mca: base: components_open: component hierarch >>>>>>> has no register function >>>>>>> [Metropolis-01:24492] mca: base: components_open: component hierarch >>>>>>> open function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component basic >>>>>>> [Metropolis-01:24492] mca: base: components_open: component basic >>>>>>> register function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: component basic has >>>>>>> no open function >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component inter >>>>>>> [Metropolis-01:24492] mca: base: components_open: component inter has >>>>>>> no register function >>>>>>> [Metropolis-01:24492] mca: base: components_open: component inter open >>>>>>> function successful >>>>>>> [Metropolis-01:24492] mca: base: components_open: found loaded >>>>>>> component self >>>>>>> [Metropolis-01:24492] mca: base: components_open: component self has no >>>>>>> register function >>>>>>> [Metropolis-01:24492] mca: base: components_open: component self open >>>>>>> function successful >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component tuned >>>>>>> [Metropolis-01:24491] coll:find_available: coll component tuned is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component sm >>>>>>> [Metropolis-01:24491] coll:sm:init_query: no other local procs; >>>>>>> disqualifying myself >>>>>>> [Metropolis-01:24491] coll:find_available: coll component sm is not >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component >>>>>>> libnbc >>>>>>> [Metropolis-01:24491] coll:find_available: coll component libnbc is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component >>>>>>> hierarch >>>>>>> [Metropolis-01:24491] coll:find_available: coll component hierarch is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component basic >>>>>>> [Metropolis-01:24491] coll:find_available: coll component basic is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component inter >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component tuned >>>>>>> [Metropolis-01:24492] coll:find_available: coll component tuned is >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component sm >>>>>>> [Metropolis-01:24492] coll:sm:init_query: no other local procs; >>>>>>> disqualifying myself >>>>>>> [Metropolis-01:24492] coll:find_available: coll component sm is not >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component >>>>>>> libnbc >>>>>>> [Metropolis-01:24492] coll:find_available: coll component libnbc is >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component >>>>>>> hierarch >>>>>>> [Metropolis-01:24492] coll:find_available: coll component hierarch is >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component basic >>>>>>> [Metropolis-01:24492] coll:find_available: coll component basic is >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component inter >>>>>>> [Metropolis-01:24492] coll:find_available: coll component inter is >>>>>>> available >>>>>>> [Metropolis-01:24492] coll:find_available: querying coll component self >>>>>>> [Metropolis-01:24492] coll:find_available: coll component self is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: coll component inter is >>>>>>> available >>>>>>> [Metropolis-01:24491] coll:find_available: querying coll component self >>>>>>> [Metropolis-01:24491] coll:find_available: coll component self is >>>>>>> available >>>>>>> [Metropolis-01:24492] hwloc:base:get_nbojbs computed data 0 of >>>>>>> NUMANode:0 >>>>>>> [Metropolis-01:24491] hwloc:base:get_nbojbs computed data 0 of >>>>>>> NUMANode:0 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: new communicator: >>>>>>> MPI_COMM_WORLD (cid 0) >>>>>>> [Metropolis-01:24491] coll:base:comm_select: Checking all available >>>>>>> modules >>>>>>> [Metropolis-01:24491] coll:tuned:module_tuned query called >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: >>>>>>> tuned, priority: 30 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: >>>>>>> libnbc, priority: 10 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> hierarch >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: >>>>>>> basic, priority: 10 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> inter >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> self >>>>>>> [Metropolis-01:24491] coll:tuned:module_init called. >>>>>>> [Metropolis-01:24491] coll:tuned:module_init Tuned is in use >>>>>>> [Metropolis-01:24491] coll:base:comm_select: new communicator: >>>>>>> MPI_COMM_SELF (cid 1) >>>>>>> [Metropolis-01:24491] coll:base:comm_select: Checking all available >>>>>>> modules >>>>>>> [Metropolis-01:24491] coll:tuned:module_tuned query called >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> tuned >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: >>>>>>> libnbc, priority: 10 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> hierarch >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: >>>>>>> basic, priority: 10 >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component not available: >>>>>>> inter >>>>>>> [Metropolis-01:24491] coll:base:comm_select: component available: self, >>>>>>> priority: 75 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: new communicator: >>>>>>> MPI_COMM_WORLD (cid 0) >>>>>>> [Metropolis-01:24492] coll:base:comm_select: Checking all available >>>>>>> modules >>>>>>> [Metropolis-01:24492] coll:tuned:module_tuned query called >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: >>>>>>> tuned, priority: 30 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: >>>>>>> libnbc, priority: 10 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> hierarch >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: >>>>>>> basic, priority: 10 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> inter >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> self >>>>>>> [Metropolis-01:24492] coll:tuned:module_init called. >>>>>>> [Metropolis-01:24492] coll:tuned:module_init Tuned is in use >>>>>>> [Metropolis-01:24492] coll:base:comm_select: new communicator: >>>>>>> MPI_COMM_SELF (cid 1) >>>>>>> [Metropolis-01:24492] coll:base:comm_select: Checking all available >>>>>>> modules >>>>>>> [Metropolis-01:24492] coll:tuned:module_tuned query called >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> tuned >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: >>>>>>> libnbc, priority: 10 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> hierarch >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: >>>>>>> basic, priority: 10 >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component not available: >>>>>>> inter >>>>>>> [Metropolis-01:24492] coll:base:comm_select: component available: self, >>>>>>> priority: 75 >>>>>>> [Metropolis-01:24491] coll:tuned:component_close: called >>>>>>> [Metropolis-01:24491] coll:tuned:component_close: done! >>>>>>> [Metropolis-01:24492] coll:tuned:component_close: called >>>>>>> [Metropolis-01:24492] coll:tuned:component_close: done! >>>>>>> [Metropolis-01:24492] mca: base: close: component tuned closed >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component tuned >>>>>>> [Metropolis-01:24492] mca: base: close: component libnbc closed >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component libnbc >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component hierarch >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component basic >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component inter >>>>>>> [Metropolis-01:24492] mca: base: close: unloading component self >>>>>>> [Metropolis-01:24491] mca: base: close: component tuned closed >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component tuned >>>>>>> [Metropolis-01:24491] mca: base: close: component libnbc closed >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component libnbc >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component hierarch >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component basic >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component inter >>>>>>> [Metropolis-01:24491] mca: base: close: unloading component self >>>>>>> [jarico@Metropolis-01 examples]$ >>>>>>> >>>>>>> >>>>>>> SM is not load because it detects no other processes in the same >>>>>>> machine: >>>>>>> >>>>>>> [Metropolis-01:24491] coll:sm:init_query: no other local procs; >>>>>>> disqualifying myself >>>>>>> >>>>>>> The machine is a multicore machine with 8 cores. >>>>>>> >>>>>>> I need to run SM component code, and I suppose that raising priority it >>>>>>> will be the component selected when problem is solved. >>>>>>> >>>>>>> >>>>>>> >>>>>>> El 03/07/2012, a las 21:01, Jeff Squyres escribió: >>>>>>> >>>>>>>> The issue is that the "sm" coll component only implements a few of the >>>>>>>> MPI collective operations. It is usually mixed at run-time with other >>>>>>>> coll components to fill out the rest of the MPI collective operations. >>>>>>>> >>>>>>>> So what is happening is that OMPI is determining that it doesn't have >>>>>>>> implementations of all the MPI collective operations and aborting. >>>>>>>> >>>>>>>> You shouldn't need to manually select your coll module -- OMPI should >>>>>>>> automatically select the right collective module for you. E.g., if >>>>>>>> all procs are local on a single machine and sm has a matching >>>>>>>> implementation for that MPI collective operation, it'll be used. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Jul 3, 2012, at 2:48 PM, Juan Antonio Rico Gallego wrote: >>>>>>>> >>>>>>>>> Output is: >>>>>>>>> >>>>>>>>> [Metropolis-01:15355] hwloc:base:get_topology >>>>>>>>> [Metropolis-01:15355] hwloc:base: no cpus specified - using root >>>>>>>>> available cpuset >>>>>>>>> >>>>>>>>> ======================== JOB MAP ======================== >>>>>>>>> >>>>>>>>> Data for node: Metropolis-01 Num procs: 2 >>>>>>>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 0 >>>>>>>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 1 >>>>>>>>> >>>>>>>>> ============================================================= >>>>>>>>> [Metropolis-01:15356] locality: CL:CU:N:B >>>>>>>>> [Metropolis-01:15356] hwloc:base: get available cpus >>>>>>>>> [Metropolis-01:15356] hwloc:base:get_available_cpus first time - >>>>>>>>> filtering cpus >>>>>>>>> [Metropolis-01:15356] hwloc:base: no cpus specified - using root >>>>>>>>> available cpuset >>>>>>>>> [Metropolis-01:15356] hwloc:base:get_available_cpus root object >>>>>>>>> [Metropolis-01:15357] locality: CL:CU:N:B >>>>>>>>> [Metropolis-01:15357] hwloc:base: get available cpus >>>>>>>>> [Metropolis-01:15357] hwloc:base:get_available_cpus first time - >>>>>>>>> filtering cpus >>>>>>>>> [Metropolis-01:15357] hwloc:base: no cpus specified - using root >>>>>>>>> available cpuset >>>>>>>>> [Metropolis-01:15357] hwloc:base:get_available_cpus root object >>>>>>>>> [Metropolis-01:15356] hwloc:base:get_nbojbs computed data 0 of >>>>>>>>> NUMANode:0 >>>>>>>>> [Metropolis-01:15357] hwloc:base:get_nbojbs computed data 0 of >>>>>>>>> NUMANode:0 >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Juan A. Rico >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Squyres >>>>>>>> jsquy...@cisco.com >>>>>>>> For corporate legal information go to: >>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel