[OMPI devel] ORTE Timing
Hello all There was some discussion at yesterday's tutorial about ORTE scalability and where bottlenecks might be occurring. I spent some time last night identifying key information required to answer those questions. I'll be presenting a slide today showing the key timing points that we would need first. I have also begun (this morning) to instrument the trunk to measure those times. Some really quick results, all done on a Mac G5: 1. It takes about 3 milliseconds to setup a job (i.e., go through the RDS, RAS, and RMAPS frameworks, setup the stage gate triggers, prep io forwarding, etc. - everything before we actually launch). This bounces around a lot (I'm just using gettimeofday), but seems to have at most a slight dependence on the number of processes being launched. 2. It takes roughly 1-3 milliseconds to execute the compound command that registers all of the data from an MPI process (i.e., the data sent at the STG1 stage gate). This is the time required on the HNP to process the command - it doesn't include any time spent actually communicating. It does, however, include time spent packing/unpacking buffers. My tests were all done on a local node for now, so the OOB just passes the buffer across from send to receive. As you would expect, since the info being stored is only from one process, there is no observable scaling dependence here. 3. The time from start of MPI_Init until we do the registry command is taking about 12-20 milliseconds - again, as expected, no observable scaling dependence. There will have to be quite a few tests, of course, but I don't expect the first two values to change very much (obviously, they will depend on the hardware on the head node). I'll keep you posted as we learn more. Ralph
[OMPI devel] problem with MPI_[Pack|Unpack]_external
I've just catched a problem with packing/unpacking using 'external32' in Linux. The problem seems to be word ordering, I believe you forgot to make the little-endian <-> big-endian conversion somewhere. Below, an interactive session with ipython (sorry, no time to write in C) showing the problem. Please, ignore me if this has been already reported. In [1]: import numpy In [2]: from mpi4py import MPI In [3]: print numpy.dtype('i').itemsize, MPI.INT.extent 4 4 In [4]: print numpy.dtype('b').itemsize, MPI.BYTE.extent 1 1 In [5]: In [5]: arr1 = numpy.array([256], dtype='i') # one int, for input In [6]: print arr1 [256] In [7]: buf = numpy.array([0,0,0,0], dtype='b') # four bytes, auxiliar In [8]: print buf [0 0 0 0] In [9]: p = MPI.INT.Pack_external('external32', arr1, buf, 0) In [10]: print buf, repr(buf.tostring()) [0 1 0 0] '\x00\x01\x00\x00' In [11]: arr2 = numpy.array([0], dtype='i') # one int, for output In [12]: print arr2 [0] In [13]: p = MPI.INT.Unpack_external('external32', buf, 0, arr2) In [14]: print arr2 [65536] In [15]: print arr2.byteswap() [256] -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594