Thanks George,
I'll try to search for those examples through the mailing list messages.
I suspect it is not going to be easy to find them though.
Regards,
Luis
On 07/04/2020 22:29, George Bosilca via devel wrote:
All the collective decisions are done on the first collective on each
communicator. So basically you can change the MCA or pvar before the
first collective in a communicator to affect how the decision
selection is made. I have posted few examples over the years on the
mailing list.
George.
On Tue, Apr 7, 2020 at 3:44 PM Josh Hursey via devel
<devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote:
If you run with "--mca coll_base_verbose 10" it will display a
priority list of the components chosen per communicator created.
You will see something like:
coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
coll:base:comm_select: selecting basic, priority 10, Enabled
coll:base:comm_select: selecting libnbc, priority 10, Enabled
coll:base:comm_select: selecting tuned, priority 30, Enabled
Where the 'tuned' component has the highest priority - so OMPI
will pick its version of a collective operation (e.g., MPI_Bcast),
if present, over the collective operation of lower priority component.
I'm not sure if there is something finer-grained in each of the
components on which specific collective function is being used or not.
-- Josh
On Tue, Apr 7, 2020 at 1:59 PM Luis Cebamanos
<l.cebama...@epcc.ed.ac.uk <mailto:l.cebama...@epcc.ed.ac.uk>> wrote:
Hi Josh,
It makes sense, thanks. Is there a debug flag that prints out
which component is chosen?
Regards,
Luis
On 07/04/2020 19:42, Josh Hursey via devel wrote:
Good question. The reason for this behavior is that the Open
MPI coll(ective) framework does not require that every
component (e.g., 'basic', 'tuned', 'libnbc') implement all of
the collective operations. It requires instead that the
composition of the available components (e.g., basic +
libnbc) provides the full set of collective operations.
This is nice for a collective implementor since they can
focus on the collective operations they want in their
component, but it does mean that the end-user needs to know
about this composition behavior.
The command below will show you all of the available
collective components in your Open MPI build.
ompi_info | grep " coll"
'self' and 'libnbc' probably need to be included in all of
your runs, maybe 'inter' as well. The others like 'tuned' and
'basic' may be able to be swapped out.
To compare 'basic' vs 'tuned' you can run:
--mca coll basic,libnbc,self
and
--mca coll tuned,libnbc,self
It is worth noting that some of the components like 'sync'
are utilities that add functionality on top of the other
collectives - in the case of 'sync' it will add a barrier
before/after N collective calls.
On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel
<devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>>
wrote:
Hello developers,
I am trying to debug the mca choices the library is
taking for
collective operations. The reason is because I want to
force the library
to choose a particular module and compare it with a
different one.
One thing I have notice is that I can do:
mpirun --mca coll basic,libnbc --np 4 ./iallreduce
for an "iallreduce" operation, but I get an error if I do
mpirun --mca coll libnbc --np 4 ./iallreduce
or
mpirun --mca coll basic --np 4 ./iallreduce
--------------------------------------------------------------------------
Although some coll components are available on your
system, none of
them said that they could be used for iallgather on a new
communicator.
This is extremely unusual -- either the "basic", "libnbc"
or "self"
components
should be able to be chosen for any communicator. As
such, this
likely means that something else is wrong (although you
should double
check that the "basic", "libnbc" and "self" coll
components are available on
your system -- check the output of the "ompi_info" command).
A coll module failed to finalize properly when a
communicator that was
using it was destroyed.
This is somewhat unusual: the module itself may be at
fault, or this
may be a symptom of another issue (e.g., a memory problem).
mca_coll_base_comm_select(MPI_COMM_WORLD) failed
--> Returned "Not found" (-13) instead of "Success" (0)
Can you please help?
Regards,
Luis
The University of Edinburgh is a charitable body,
registered in Scotland, with registration number SC005336.
--
Josh Hursey
IBM Spectrum MPI Developer
--
Josh Hursey
IBM Spectrum MPI Developer
--
~~~~~~~~~~~~~~~~~~~~~~~~~ | EPCC | ~~~~~~~~~~~~~~~~~~~~~~~~~
Luis Cebamanos, HPC Applications Consultant
Email: l.cebama...@epcc.ed.ac.uk Phone: +44 (0) 131 651 3479
http://www.epcc.ed.ac.uk/
The Bayes Centre, 47 Potterrow, Edinburgh UK
EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~