Thanks George,

I'll try to search for those examples through the mailing list messages. I suspect it is not going to be easy to find them though.

Regards,
Luis

On 07/04/2020 22:29, George Bosilca via devel wrote:
All the collective decisions are done on the first collective on each communicator. So basically you can change the MCA or pvar before the first collective in a communicator to affect how the decision selection is made. I have posted few examples over the years on the mailing list.

  George.


On Tue, Apr 7, 2020 at 3:44 PM Josh Hursey via devel <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote:

    If you run with "--mca coll_base_verbose 10" it will display a
    priority list of the components chosen per communicator created.
    You will see something like:
    coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
    coll:base:comm_select: selecting       basic, priority  10, Enabled
    coll:base:comm_select: selecting      libnbc, priority  10, Enabled
    coll:base:comm_select: selecting       tuned, priority  30, Enabled

    Where the 'tuned' component has the highest priority - so OMPI
    will pick its version of a collective operation (e.g., MPI_Bcast),
    if present, over the collective operation of lower priority component.

    I'm not sure if there is something finer-grained in each of the
    components on which specific collective function is being used or not.

    -- Josh


    On Tue, Apr 7, 2020 at 1:59 PM Luis Cebamanos
    <l.cebama...@epcc.ed.ac.uk <mailto:l.cebama...@epcc.ed.ac.uk>> wrote:

        Hi Josh,

        It makes sense, thanks. Is there a debug flag that prints out
        which component is chosen?

        Regards,
        Luis


        On 07/04/2020 19:42, Josh Hursey via devel wrote:
        Good question. The reason for this behavior is that the Open
        MPI coll(ective) framework does not require that every
        component (e.g., 'basic', 'tuned', 'libnbc') implement all of
        the collective operations. It requires instead that the
        composition of the available components (e.g., basic +
        libnbc) provides the full set of collective operations.

        This is nice for a collective implementor since they can
        focus on the collective operations they want in their
        component, but it does mean that the end-user needs to know
        about this composition behavior.

        The command below will show you all of the available
        collective components in your Open MPI build.
        ompi_info | grep " coll"

        'self' and 'libnbc' probably need to be included in all of
        your runs, maybe 'inter' as well. The others like 'tuned' and
        'basic' may be able to be swapped out.

        To compare 'basic' vs 'tuned' you can run:
         --mca coll basic,libnbc,self
        and
         --mca coll tuned,libnbc,self

        It is worth noting that some of the components like 'sync'
        are utilities that add functionality on top of the other
        collectives - in the case of 'sync' it will add a barrier
        before/after N collective calls.



        On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel
        <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>>
        wrote:

            Hello developers,

            I am trying to debug the mca choices the library is
            taking for
            collective operations. The reason is because I want to
            force the library
            to choose a particular module and compare it with a
            different one.
            One thing I have notice is that I can do:

            mpirun --mca coll basic,libnbc  --np 4 ./iallreduce

            for an "iallreduce" operation, but I get an error if I do

            mpirun --mca coll libnbc  --np 4 ./iallreduce
            or
            mpirun --mca coll basic  --np 4 ./iallreduce

            
--------------------------------------------------------------------------
            Although some coll components are available on your
            system, none of
            them said that they could be used for iallgather on a new
            communicator.

            This is extremely unusual -- either the "basic", "libnbc"
            or "self"
            components
            should be able to be chosen for any communicator.  As
            such, this
            likely means that something else is wrong (although you
            should double
            check that the "basic", "libnbc" and "self" coll
            components are available on
            your system -- check the output of the "ompi_info" command).
            A coll module failed to finalize properly when a
            communicator that was
            using it was destroyed.

            This is somewhat unusual: the module itself may be at
            fault, or this
            may be a symptom of another issue (e.g., a memory problem).

              mca_coll_base_comm_select(MPI_COMM_WORLD) failed
               --> Returned "Not found" (-13) instead of "Success" (0)


            Can you please help?

            Regards,
            Luis
            The University of Edinburgh is a charitable body,
            registered in Scotland, with registration number SC005336.



-- Josh Hursey
        IBM Spectrum MPI Developer



-- Josh Hursey
    IBM Spectrum MPI Developer


--

~~~~~~~~~~~~~~~~~~~~~~~~~ | EPCC | ~~~~~~~~~~~~~~~~~~~~~~~~~

Luis Cebamanos, HPC Applications Consultant
Email: l.cebama...@epcc.ed.ac.uk Phone: +44 (0) 131 651 3479
http://www.epcc.ed.ac.uk/
The Bayes Centre, 47 Potterrow, Edinburgh UK
EH8 9BT

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reply via email to