If you run with "--mca coll_base_verbose 10" it will display a priority list of the components chosen per communicator created. You will see something like: coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0) coll:base:comm_select: selecting basic, priority 10, Enabled coll:base:comm_select: selecting libnbc, priority 10, Enabled coll:base:comm_select: selecting tuned, priority 30, Enabled
Where the 'tuned' component has the highest priority - so OMPI will pick its version of a collective operation (e.g., MPI_Bcast), if present, over the collective operation of lower priority component. I'm not sure if there is something finer-grained in each of the components on which specific collective function is being used or not. -- Josh On Tue, Apr 7, 2020 at 1:59 PM Luis Cebamanos <l.cebama...@epcc.ed.ac.uk <mailto:l.cebama...@epcc.ed.ac.uk> > wrote: Hi Josh, It makes sense, thanks. Is there a debug flag that prints out which component is chosen? Regards, Luis On 07/04/2020 19:42, Josh Hursey via devel wrote: Good question. The reason for this behavior is that the Open MPI coll(ective) framework does not require that every component (e.g., 'basic', 'tuned', 'libnbc') implement all of the collective operations. It requires instead that the composition of the available components (e.g., basic + libnbc) provides the full set of collective operations. This is nice for a collective implementor since they can focus on the collective operations they want in their component, but it does mean that the end-user needs to know about this composition behavior. The command below will show you all of the available collective components in your Open MPI build. ompi_info | grep " coll" 'self' and 'libnbc' probably need to be included in all of your runs, maybe 'inter' as well. The others like 'tuned' and 'basic' may be able to be swapped out. To compare 'basic' vs 'tuned' you can run: --mca coll basic,libnbc,self and --mca coll tuned,libnbc,self It is worth noting that some of the components like 'sync' are utilities that add functionality on top of the other collectives - in the case of 'sync' it will add a barrier before/after N collective calls. On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > wrote: Hello developers, I am trying to debug the mca choices the library is taking for collective operations. The reason is because I want to force the library to choose a particular module and compare it with a different one. One thing I have notice is that I can do: mpirun --mca coll basic,libnbc --np 4 ./iallreduce for an "iallreduce" operation, but I get an error if I do mpirun --mca coll libnbc --np 4 ./iallreduce or mpirun --mca coll basic --np 4 ./iallreduce -------------------------------------------------------------------------- Although some coll components are available on your system, none of them said that they could be used for iallgather on a new communicator. This is extremely unusual -- either the "basic", "libnbc" or "self" components should be able to be chosen for any communicator. As such, this likely means that something else is wrong (although you should double check that the "basic", "libnbc" and "self" coll components are available on your system -- check the output of the "ompi_info" command). A coll module failed to finalize properly when a communicator that was using it was destroyed. This is somewhat unusual: the module itself may be at fault, or this may be a symptom of another issue (e.g., a memory problem). mca_coll_base_comm_select(MPI_COMM_WORLD) failed --> Returned "Not found" (-13) instead of "Success" (0) Can you please help? Regards, Luis The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Josh Hursey IBM Spectrum MPI Developer -- Josh Hursey IBM Spectrum MPI Developer