Got that. Thank you! четверг, 18 сентября 2014 г. пользователь Ralph Castain написал:
> I believe compile-time is preferable as there is a non-zero time impact of > enabling this code. It's really more for developers to improve scalability > - if a user is actually interested, I think it isn't that hard for them to > configure it. > > > On Sep 18, 2014, at 7:16 AM, Artem Polyakov <artpo...@gmail.com > <javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');>> wrote: > > Jeff, thank you for the feedback! All of mentioned issues are clear and I > will fix them shortly. > > One important thing that needs additional discussion is compile-time vs > runtime selection. Ralph, what do you think about that? Several of issues > depends on that decision. > > 2014-09-18 20:09 GMT+07:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com > <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');>>: > >> I have a few comments: >> >> - This looks nice. Thanks for the contribution. >> >> - I notice that the ORTE timing stuff is now a compile-time decision, not >> a run-time decision. Do we care that we've now taken away the ability for >> users to do timings in a production build? > > - "clksync" -- can we use "clocksync"? It's only 2 letters. We tend to >> use real words in the OMPI code base; unnecessary abbreviation should be >> avoided. > > >> - r32738 introduced a several files into the code base that have no >> copyrights, and do not have the standard OMPI copyright header block. >> Please fix. >> >> - There's no documentation on how to use mpisync, mpirun_prof, or >> ompi_timing_post, even though they're installed when you --enable-timing. >> What are these 3 executables? Can we get man pages? >> > I post their description in the first e-mail. Sure I can prepare man pages > for them, > > >> >> - What's the purpose of the MCA param orte_rml_base_timing? A *quick* >> look through the code seems to indicate that it is ignored. >> >> - What's the purpose of the MCA params opal_clksync_file, >> opal_timing_file, and opal_timing_overhead? E.g., what is a "clksync" >> file, what is it for, and what is its format? Does the user have to >> provide one? If so, how to you get one? Or is it an output file? >> ...etc. The brief descriptions given in the MCA help strings don't really >> provide enough information for someone who has no idea what the timing >> stuff is. Also, can those 3 params have a common prefix? I.e., it's not >> obvious that opal_clksync_file is related to opal_timing_* at all. > > >> - A *quick* look at ompi/tools/mpisync shows that a bunch of that code >> came from an external project. Is the license compatible with OMPI's >> license? What do we need to do to conform to their license? >> >> - opal/util/timings.h is protected by OPAL_SYS_TIMING_H -- shouldn't it >> be OPAL_UTIL_TIMINGS_H? >> >> - There's commented-out code in opal/util/timings.h. >> >> - There's no doxygen-style documentation in opal/util/timings.h to tell >> developers how to use it. >> >> - There's "TODO" comments in opal/util/timings.c; should those be fixed? >> >> - opal_config.h should be the first include in opal/util/timings.c. >> >> - If timing support is not to be compiled in, then opal/util/timings.c >> should not be be compiled via the Makefile.am (rather than entirely #if'ed >> out). >> >> It looks like this work is about 95% complete. Finishing the remaining >> 5% would make it great and genuinely useful to the rest of the code base. >> >> Thanks! >> >> >> >> On Sep 16, 2014, at 10:20 AM, Artem Polyakov <artpo...@gmail.com >> <javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');>> wrote: >> >> > Hello, >> > >> > I would like to introduce OMPI timing framework that was included into >> the trunk yesterday (r32738). The code is new so if you'll hit some bugs - >> just let me know. >> > >> > The framework consists of the set of macro's and routines for internal >> OMPI usage + standalone tool mpisync and few additional scripts: >> mpirun_prof and ompi_timing_post. The set of features is very basic and I >> am open for discussion of new things that are desirable there. >> > >> > To enable framework compilation you should configure OMPI with >> --enable-timing option. If the option was passed to ./configure, standalone >> tools and scripts will be installed into <prefix>/bin. >> > >> > The timing code is located in OPAL (opal/utils/timing.[ch]). There is a >> set of macro's that should be used to preprocess out all mentions of the >> timing code in case it wasn't requested with --enable-timing: >> > OPAL_TIMING_DECLARE(t) - declare timing handler structure with name "t". >> > OPAL_TIMING_DECLARE_EXT(x, t) - external declaration of a timing >> handler "t". >> > OPAL_TIMING_INIT(t) - initialize timing handler "t" >> > OPAL_TIMING_EVENT(x) - printf-like event declaration similar to >> OPAL_OUTPUT. >> > The information about the event will be quickly inserted into the >> linked list. Maximum event description is limited by OPAL_TIMING_DESCR_MAX. >> > The malloc is performed in buckets (OPAL_TIMING_BUFSIZE at once) and >> overhead (time to malloc and prepare the bucket) is accounted in >> corresponding list element. It might be excluded from the timing results >> (controlled by OMPI_MCA_opal_timing_overhead parameter). >> > OPAL_TIMING_REPORT(enable, t, prefix) - prepare and print out timing >> information. If OMPI_MCA_opal_timing_file was specified the output will go >> to that file. In other case the output will be directed using opal_output, >> each line will be prefixed with "prefix" to ease grep'ing. "enable" is a >> boolean/integer variable that is used for runtime selection of what should >> be reported. >> > OPAL_TIMING_RELEASE(t) - the counterpart for OPAL_TIMING_INIT. >> > >> > There are several examples in OMPI code. And here is another simple >> example: >> > OPAL_TIMING_DECLARE(tm); >> > OPAL_TIMING_INIT(&tm); >> > ... >> > OPAL_TIMING_EVENT((&tm,"Begin of timing: %s", >> ORTE_NAME_PRINT(&(peer->name)) )); >> > .... >> > OPAL_TIMING_EVENT((&tm,"Next timing event with condition x = %d", x >> )); >> > ... >> > OPAL_TIMING_EVENT((&tm,"Finish")); >> > OPAL_TIMING_REPORT(enable_var, &tm,"MPI Init"); >> > OPAL_TIMING_RELEASE(&tm); >> > >> > >> > An output from all OMPI processes (mpirun, orted's, user processes) is >> merged together. NTP provides 1 millisecond - 100 microsecond level of >> precision. This may not be sufficient to order events globally. >> > To help developers extract the most realistic picture of what is going >> on, additional time synchronisation might be performed before profiling. >> The mpisync program should be runned 1-user-process-per-node to acquire the >> file with time offsets relative to HNP of each node. If the cluster runs >> over Gig Ethernet the precision will be 30-50 microseconds, in case of >> Infiniband - 4 microseconds. mpisync produces output file that might be >> readed and used by timing framework (OMPI_MCA_opal_clksync_file parameter). >> The bad news is that this synchronisation is not enough because of >> different clock skew on different nodes. Additional periodical >> synchronisation is needed. This is planned for the near future (me and >> Ralph discussing possible ways now). >> > >> > the mpirun_prof & ompi_timing_post script may be used to automate clock >> synchronisation in following manner: >> > export OMPI_MCA_ompi_timing=true >> > export OMPI_MCA_orte_oob_timing=true >> > export OMPI_MCA_orte_rml_timing=true >> > export OMPI_MCA_opal_timing_file=timing.out >> > mpirun_prof <ompi-params> ./mpiprog >> > ompi_timing_post timing.out >> > >> > ompi_timing_post will simply sort the events and made all times to be >> relative to the first one. >> > >> > -- >> > С Уважением, Поляков Артем Юрьевич >> > Best regards, Artem Y. Polyakov >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15837.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');> >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15869.php > > > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > _______________________________________________ > devel mailing list > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15870.php > > > -- ----- Best regards, Artem Polyakov (Mobile mail)