Got that. Thank you!

четверг, 18 сентября 2014 г. пользователь Ralph Castain написал:

> I believe compile-time is preferable as there is a non-zero time impact of
> enabling this code. It's really more for developers to improve scalability
> - if a user is actually interested, I think it isn't that hard for them to
> configure it.
>
>
> On Sep 18, 2014, at 7:16 AM, Artem Polyakov <artpo...@gmail.com
> <javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');>> wrote:
>
> Jeff, thank you for the feedback! All of mentioned issues are clear and I
> will fix them shortly.
>
> One important thing that needs additional discussion is compile-time vs
> runtime selection. Ralph, what do you think about that? Several of issues
> depends on that decision.
>
> 2014-09-18 20:09 GMT+07:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com
> <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');>>:
>
>> I have a few comments:
>>
>> - This looks nice.  Thanks for the contribution.
>>
>> - I notice that the ORTE timing stuff is now a compile-time decision, not
>> a run-time decision.  Do we care that we've now taken away the ability for
>> users to do timings in a production build?
>
> - "clksync" -- can we use "clocksync"?  It's only 2 letters.  We tend to
>> use real words in the OMPI code base; unnecessary abbreviation should be
>> avoided.
>
>
>> - r32738 introduced a several files into the code base that have no
>> copyrights, and do not have the standard OMPI copyright header block.
>> Please fix.
>>
>> - There's no documentation on how to use mpisync, mpirun_prof, or
>> ompi_timing_post, even though they're installed when you --enable-timing.
>> What are these 3 executables?  Can we get man pages?
>>
> I post their description in the first e-mail. Sure I can prepare man pages
> for them,
>
>
>>
>> - What's the purpose of the MCA param orte_rml_base_timing?  A *quick*
>> look through the code seems to indicate that it is ignored.
>>
>> - What's the purpose of the MCA params opal_clksync_file,
>> opal_timing_file, and opal_timing_overhead?  E.g., what is a "clksync"
>> file, what is it for, and what is its format?  Does the user have to
>> provide one?  If so, how to you get one?  Or is it an output file?
>> ...etc.  The brief descriptions given in the MCA help strings don't really
>> provide enough information for someone who has no idea what the timing
>> stuff is.  Also, can those 3 params have a common prefix?  I.e., it's not
>> obvious that opal_clksync_file is related to opal_timing_* at all.
>
>
>> - A *quick* look at ompi/tools/mpisync shows that a bunch of that code
>> came from an external project.  Is the license compatible with OMPI's
>> license?  What do we need to do to conform to their license?
>>
>> - opal/util/timings.h is protected by OPAL_SYS_TIMING_H -- shouldn't it
>> be OPAL_UTIL_TIMINGS_H?
>>
>> - There's commented-out code in opal/util/timings.h.
>>
>> - There's no doxygen-style documentation in opal/util/timings.h to tell
>> developers how to use it.
>>
>> - There's "TODO" comments in opal/util/timings.c; should those be fixed?
>>
>> - opal_config.h should be the first include in opal/util/timings.c.
>>
>> - If timing support is not to be compiled in, then opal/util/timings.c
>> should not be be compiled via the Makefile.am (rather than entirely #if'ed
>> out).
>>
>> It looks like this work is about 95% complete.  Finishing the remaining
>> 5% would make it great and genuinely useful to the rest of the code base.
>>
>> Thanks!
>>
>>
>>
>> On Sep 16, 2014, at 10:20 AM, Artem Polyakov <artpo...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');>> wrote:
>>
>> > Hello,
>> >
>> > I would like to introduce OMPI timing framework that was included into
>> the trunk yesterday (r32738). The code is new so if you'll hit some bugs -
>> just let me know.
>> >
>> > The framework consists of the set of macro's and routines for internal
>> OMPI usage + standalone tool mpisync and few additional scripts:
>> mpirun_prof and ompi_timing_post. The set of features is very basic and I
>> am open for discussion of new things that are desirable there.
>> >
>> > To enable framework compilation you should configure OMPI with
>> --enable-timing option. If the option was passed to ./configure, standalone
>> tools and scripts will be installed into <prefix>/bin.
>> >
>> > The timing code is located in OPAL (opal/utils/timing.[ch]). There is a
>> set of macro's that should be used to preprocess out all mentions of the
>> timing code in case it wasn't requested with --enable-timing:
>> > OPAL_TIMING_DECLARE(t) - declare timing handler structure with name "t".
>> > OPAL_TIMING_DECLARE_EXT(x, t) - external declaration of a timing
>> handler "t".
>> > OPAL_TIMING_INIT(t) - initialize timing handler "t"
>> > OPAL_TIMING_EVENT(x) - printf-like event declaration similar to
>> OPAL_OUTPUT.
>> > The information about the event will be quickly inserted into the
>> linked list. Maximum event description is limited by OPAL_TIMING_DESCR_MAX.
>> > The malloc is performed in buckets (OPAL_TIMING_BUFSIZE at once) and
>> overhead (time to malloc and prepare the bucket) is accounted in
>> corresponding list element. It might be excluded from the timing results
>> (controlled by OMPI_MCA_opal_timing_overhead parameter).
>> > OPAL_TIMING_REPORT(enable, t, prefix) - prepare and print out timing
>> information. If OMPI_MCA_opal_timing_file was specified the output will go
>> to that file. In other case the output will be directed using opal_output,
>> each line will be prefixed with "prefix" to ease grep'ing. "enable" is a
>> boolean/integer variable that is used for runtime selection of what should
>> be reported.
>> > OPAL_TIMING_RELEASE(t) - the counterpart for OPAL_TIMING_INIT.
>> >
>> > There are several examples in OMPI code. And here is another simple
>> example:
>> >     OPAL_TIMING_DECLARE(tm);
>> >     OPAL_TIMING_INIT(&tm);
>> >     ...
>> >     OPAL_TIMING_EVENT((&tm,"Begin of timing: %s",
>> ORTE_NAME_PRINT(&(peer->name)) ));
>> >     ....
>> >     OPAL_TIMING_EVENT((&tm,"Next timing event with condition x = %d", x
>> ));
>> >     ...
>> >     OPAL_TIMING_EVENT((&tm,"Finish"));
>> >     OPAL_TIMING_REPORT(enable_var, &tm,"MPI Init");
>> >     OPAL_TIMING_RELEASE(&tm);
>> >
>> >
>> > An output from all OMPI processes (mpirun, orted's, user processes) is
>> merged together. NTP provides 1 millisecond - 100 microsecond level of
>> precision. This may not be sufficient to order events globally.
>> > To help developers extract the most realistic picture of what is going
>> on, additional time synchronisation might be performed before profiling.
>> The mpisync program should be runned 1-user-process-per-node to acquire the
>> file with time offsets relative to HNP of each node. If the cluster runs
>> over Gig Ethernet the precision will be 30-50 microseconds, in case of
>> Infiniband - 4 microseconds. mpisync produces output file that might be
>> readed and used by timing framework (OMPI_MCA_opal_clksync_file parameter).
>> The bad news is that this synchronisation is not enough because of
>> different clock skew on different nodes. Additional periodical
>> synchronisation is needed. This is planned for the near future (me and
>> Ralph discussing possible ways now).
>> >
>> > the mpirun_prof & ompi_timing_post script may be used to automate clock
>> synchronisation in following manner:
>> > export OMPI_MCA_ompi_timing=true
>> > export OMPI_MCA_orte_oob_timing=true
>> > export OMPI_MCA_orte_rml_timing=true
>> > export OMPI_MCA_opal_timing_file=timing.out
>> > mpirun_prof <ompi-params> ./mpiprog
>> > ompi_timing_post timing.out
>> >
>> > ompi_timing_post will simply sort the events and made all times to be
>> relative to the first one.
>> >
>> > --
>> > С Уважением, Поляков Артем Юрьевич
>> > Best regards, Artem Y. Polyakov
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/09/15837.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');>
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/09/15869.php
>
>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15870.php
>
>
>

-- 
-----
Best regards, Artem Polyakov
(Mobile mail)

Reply via email to