Dahai Guo,
On 10/7/2015 3:08 AM, Dahai Guo wrote:
Thanks, Jeff. It is very helpful. some more questions :-):
1. There are many coll components, such as basic, tuned, self, cuda,
sm, and etc. Are they all selected at the MPI_Init time? or it just
select those satisfying some criteria, hardware, communicator size?
or only some specific ones are selected?
some components simply disqualify themselves at MPI_Init time,
and some other components are not selected when a communicator is created
(for example coll/sm cannot be used on an communicator with tasks on
several nodes,
note coll_sm_priority is zero by default, so coll/sm is disqualified at
MPI_Init time and hence
this is not a perfect example)
ompi/mpi/c/barrier.c uses a function pointer to the barrier subroutine,
and this function pointer has been set when the communicator was created.
2. MPI_Barrier seems choose the exact algorithm for the API in
MPI_Init, since I checked the file ompi/mpi/c/barrier.c, and there is
no choice except inter/intra judge. Would you please point out in
which code it is selected? So that I can get some hint for other MPI
collective functions selection, and .
most of the time, coll/tuned is used.
and most of the time, coll_tuned_XXX_intra_dec_fixed is used
this function will choose the collective algorithm to be used.
for example, MPI_Barrier invokes
ompi_coll_tuned_barrier_intra_dec_fixed, which will choose and invoke
one of :
- ompi_coll_base_barrier_intra_two_procs
- ompi_coll_base_barrier_intra_bruck
- ompi_coll_base_barrier_intra_recursivedoubling
the best way to understand this part is probably to use a debugger, set
a breakpoint in MPI_Barrier and step as long as required.
in most cases, you will end up using the coll/tuned module on an
intra-communicator, so unless you plan to develop your own collective
module, you can skip the module initialization/selection part.
3. I saw somewhere the run-time parameters to choose algorithms, such
as "--mca coll_tuned_reduce_algorithm 5". Where can I find the
complete list of these kinds of runtime options and their value choices?
you can run ompi_info --all and search for coll_tuned_xxx_algorithm
for example
MCA coll: parameter "coll_tuned_barrier_algorithm" (current value:
"ignore", data source: default, level: 5 tuner/detail, type: int)
Which barrier algorithm is used. Can be
locked down to choice of: 0 ignore, 1 linear, 2 double ring, 3:
recursive doubling 4: bruck, 5: two proc only, 6: tree
Valid values: 0:"ignore", 1:"linear",
2:"double_ring", 3:"recursive_doubling", 4:"bruck", 5:"two_proc", 6:"tree"
if you want to force the usage of the bruck (4) algorithm, you can run
mpirun --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_barrier_algorithm 4 ...
Cheers,
Gilles
Dahai
On Tuesday, October 6, 2015 12:25 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
On Oct 6, 2015, at 10:19 AM, Dahai Guo <dahaiguo2...@yahoo.com
<mailto:dahaiguo2...@yahoo.com>> wrote:
>
> Thanks, Gilles. Some more questions:
>
> 1. how does Open MPI define the priorities of the different
collective components? what criteria is based on?
The priorities are in the range of [0, 100] (100=highest). The
priorities tend to be fairly coarse-grained; they're mainly based on
relative knowledge of how good / bad a particular algorithm is going
to be.
> 2. how does a MPI collective function (MPI_Barrier for example)
choose the exact algorithm it use? based on message size, and
communicator size? any other factors?
Yes (all of the above). Meaning: each component is responsible for a)
determining whether it will provide a function pointer for each
operation, and b) what that function pointer's priority should be
(same disclaimer as my last mail: I don't remember offhand if there's
a single priority for the whole component, or on a
per-function-pointer/operation basis).
Hence, the component can use whatever criteria it wants to determine
if it wants to provide a function pointer or not. E.g., if it only
has algorithms that work with communicators that have a size that is a
power of 2, then it can use that information to determine whether it
wants to provide a function pointer for a new communicator or not.
> 3. when does MPI_Barrier choose the algorithm? in ompi_mpi_init?
or every time the API program calls the MPI_barrier?
A combination of: when the communicator is constructed and when the
barrier is run.
I already described the communicator-constructor scenario. But in
addition to that, it's certainly possible to have a collective
operation dispatch to a function that makes a further run-time based
decision (the tuned collective component does a lot of this).
For barrier that wouldn't really be necessary (because you can setup
everything at communicator constructor time because the MPI_BCAST API
doesn't have any variation in its parameters -- i.e., you know
everything at communicator constructor time). But for other
operations, you might choose different algorithms depending on the
number of local peers, the size of the message, ...etc. Hence, you
might want to make the final algorithm dispatch decision when
MPI_GATHER is invoked with the final set of parameters, etc.
> 4. all the MPI collective functions follow the same procedure to
choose algorithms in the API program?
I'm not sure how to parse this question.
In general, all MPI collective operations follow the same procedure to
select which component is selected at communicator constructor time.
When the collective operation is dispatched off to the module at run
time (e.g., when MPI_BCAST is invoked), it's then up to the module to
decide what to do next (i.e., how to actually effect that collective
operation).
> It would be great if you can point out some main OMPI files and
functions that are involved in the process.
You might want to step through the selection process with a debugger
to see what happens. Set a breakpoint on mca_coll_base_comm_select()
and step through from there.
> Dahai
>
>
>
> On Tuesday, October 6, 2015 1:08 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
wrote:
>
>
> at first, you can check the priorities of the various coll modules
> with ompi_info
>
> $ ompi_info --all | grep \"coll_ | grep priority
> MCA coll: parameter "coll_basic_priority" (current
> value: "10", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_inter_priority" (current
> value: "40", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_libnbc_priority" (current
> value: "10", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_ml_priority" (current value:
> "0", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_self_priority" (current
> value: "75", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_sm_priority" (current value:
> "0", data source: default, level: 9 dev/all, type: int)
> MCA coll: parameter "coll_tuned_priority" (current
> value: "30", data source: default, level: 6 tuner/all, type: int)
>
>
> coll_tuned_priority likely the collective module you will be using.
> then you can check the various ompi_coll_tuned_*_intra_dec_fixed
functions in
> ompi/mca/coll/tuned/coll_tuned_decision_fixed.c
> this is how the tuned collective module selects algorithms based on
> communicator size and message size.
>
> Cheers,
>
> Gilles
>
> On Sun, Oct 4, 2015 at 11:12 AM, Dahai Guo <dahaiguo2...@yahoo.com
<mailto:dahaiguo2...@yahoo.com>> wrote:
> > Thanks, Jeff. I am trying to understand in detail how Open MPI
works in the
> > run time. What main functions does it call to select and
initialize the coll
> > components? Using the "helloworld" as an example, how does it
select and
> > initialize the MPI_Barrier algorithm? which C functions are
involved and
> > used in the process?
> >
> > Dahai
> >
> >
> >
> > On Friday, October 2, 2015 7:50 PM, Jeff Squyres (jsquyres)
> > <jsquy...@cisco.com <mailto:jsquy...@cisco.com>> wrote:
> >
> >
> > On Oct 2, 2015, at 2:21 PM, Dahai Guo <dahaiguo2...@yahoo.com
<mailto:dahaiguo2...@yahoo.com>> wrote:
> >>
> >> Is there any way to trace open mpi internal function calls in a
MPI user
> >> program?
> >
> > Unfortunately, not easily -- other than using a debugger, for example.
> >
> >> If so, can any one explain it with an example? such as helloworld? I
> >> build open MPI with the VampirTrace options, and compile the
following
> >> program with picc-vt,. but I didn't get any tracing info.
> >
> > Open MPI is a giant state machine -- MPI_INIT, for example,
invokes slightly
> > fewer than a bazillion functions (e.g., it initializes every
framework and
> > many components/plugins).
> >
> > Is there something in particular that you're looking for / want to
know
> > about?
> >
> >> Thanks
> >>
> >> D. G.
> >>
> >> #include <stdio.h>
> >> #include <mpi.h>
> >>
> >>
> >> int main (int argc, char **argv)
> >> {
> >> int rank, size;
> >>
> >> MPI_Init (&argc, &argv);
> >> MPI_Comm_rank (MPI_COMM_WORLD, &rank);
> >> MPI_Comm_size (MPI_COMM_WORLD, &size);
> >> printf( "Hello world from process %d of %d\n", rank, size );
> >> MPI_Barrier(MPI_COMM_WORLD);
> >> MPI_Finalize();
> >> return 0;
> >> }
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org <mailto:de...@open-mpi.org>
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2015/10/18125.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
>
> >
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org <mailto:de...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
>
> > http://www.open-mpi.org/community/lists/devel/2015/10/18138.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/10/18140.php
--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/10/18143.php