Hello, I would like to introduce OMPI timing framework that was included into the trunk yesterday (r32738). The code is new so if you'll hit some bugs - just let me know.
The framework consists of the set of macro's and routines for internal OMPI usage + standalone tool mpisync and few additional scripts: mpirun_prof and ompi_timing_post. The set of features is very basic and I am open for discussion of new things that are desirable there. To enable framework compilation you should configure OMPI with --enable-timing option. If the option was passed to ./configure, standalone tools and scripts will be installed into <prefix>/bin. The timing code is located in OPAL (opal/utils/timing.[ch]). There is a set of macro's that should be used to preprocess out all mentions of the timing code in case it wasn't requested with --enable-timing: OPAL_TIMING_DECLARE(t) - declare timing handler structure with name "t". OPAL_TIMING_DECLARE_EXT(x, t) - external declaration of a timing handler "t". OPAL_TIMING_INIT(t) - initialize timing handler "t" OPAL_TIMING_EVENT(x) - printf-like event declaration similar to OPAL_OUTPUT. The information about the event will be quickly inserted into the linked list. Maximum event description is limited by OPAL_TIMING_DESCR_MAX. The malloc is performed in buckets (OPAL_TIMING_BUFSIZE at once) and overhead (time to malloc and prepare the bucket) is accounted in corresponding list element. It might be excluded from the timing results (controlled by OMPI_MCA_opal_timing_overhead parameter). OPAL_TIMING_REPORT(enable, t, prefix) - prepare and print out timing information. If OMPI_MCA_opal_timing_file was specified the output will go to that file. In other case the output will be directed using opal_output, each line will be prefixed with "prefix" to ease grep'ing. "enable" is a boolean/integer variable that is used for runtime selection of what should be reported. OPAL_TIMING_RELEASE(t) - the counterpart for OPAL_TIMING_INIT. There are several examples in OMPI code. And here is another simple example: OPAL_TIMING_DECLARE(tm); OPAL_TIMING_INIT(&tm); ... OPAL_TIMING_EVENT((&tm,"Begin of timing: %s", ORTE_NAME_PRINT(&(peer->name)) )); .... OPAL_TIMING_EVENT((&tm,"Next timing event with condition x = %d", x )); ... OPAL_TIMING_EVENT((&tm,"Finish")); OPAL_TIMING_REPORT(enable_var, &tm,"MPI Init"); OPAL_TIMING_RELEASE(&tm); An output from all OMPI processes (mpirun, orted's, user processes) is merged together. NTP provides 1 millisecond - 100 microsecond level of precision. This may not be sufficient to order events globally. To help developers extract the most realistic picture of what is going on, additional time synchronisation might be performed before profiling. The mpisync program should be runned 1-user-process-per-node to acquire the file with time offsets relative to HNP of each node. If the cluster runs over Gig Ethernet the precision will be 30-50 microseconds, in case of Infiniband - 4 microseconds. mpisync produces output file that might be readed and used by timing framework (OMPI_MCA_opal_clksync_file parameter). The bad news is that this synchronisation is not enough because of different clock skew on different nodes. Additional periodical synchronisation is needed. This is planned for the near future (me and Ralph discussing possible ways now). the mpirun_prof & ompi_timing_post script may be used to automate clock synchronisation in following manner: export OMPI_MCA_ompi_timing=true export OMPI_MCA_orte_oob_timing=true export OMPI_MCA_orte_rml_timing=true export OMPI_MCA_opal_timing_file=timing.out mpirun_prof <ompi-params> ./mpiprog ompi_timing_post timing.out ompi_timing_post will simply sort the events and made all times to be relative to the first one. -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov