Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
On Oct 19, 2011, at 6:41 PM, George Bosilca wrote: > A careful reading of the committed patch, would have pointed out that none of > the concerns raised so far were true, the "old-way" behavior of the OMPI code > was preserved. Then perhaps you could have added some comments to explain the not-obvious semantics, and less people would have argued. :-) > Moreover, every single of the error codes removed were not used in ages. On the trunk. It is highly likely that no one else is using those codes anywhere, but you can't *know* that. A courtesy RFC is always a good idea here. Indeed, you have railed against exactly this kind of behavior before: people changing things on the trunk that had impact on your private research branches. :-) > What Brian pointed out as evil, evil being a subjective notion by itself, > didn't prevent the correct behavior of the code, nor affected in any way it's > correctness. Anyway, to address his concern I pushed a patch (25333) putting > the OMPI error codes back where they were originally. The point was that an RFC would have been a much less controversial way of doing this, especially since the code turned out to be fairly subtle, un-commented, and different than what has been done in the past. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] Process ranks
Since people may not be fully familiar, and because things have evolved, I thought it might help to provide a brief explanation of the ranks we assign to processes in OMPI. Each process has four "ranks" assigned to it at launch: 1. vpid - equivalent to its MPI rank within the job. You can access the vpid with ORTE_PROC_MY_NAME->vpid. 2. local_rank - the relative rank of the process, within its own job, on the local node. For example, if there are three processes from this job on the node, then the lowest vpid process would have local_rank=0, the next highest vpid process would have local_rank=1, etc. The local_rank is typically used by the shared memory subsystem to decide which proc will create the backing file. Note that processes from dynamically spawned jobs on the node will have overlapping local_ranks. For example, if a process on the above job were to comm_spawn two more procs on the node, the lowest vpid of those would also have local_rank=0 as it is in a different jobid. Every process has full knowledge of the local_rank for every other process executing within that mpirun AND for any proc that connected to it via MPI connect/accept or comm_spawn (the info is included in the modex during the connect/accept procedure). You can obtain the local_rank of any process using orte_local_rank_t orte_ess.get_local_rank(proc_name) This will return ORTE_LOCAL_RANK_INVALID if the info isn't known. 3. node_rank - the relative rank of the process, spanning all jobs under this mpirun, on the local node. The node_rank is typically used by the OOB to select a static port from the given range, thus ensuring that each proc on the node - regardless of job - takes a unique port. For example, if there are three processes from this job on the node, then the lowest vpid process would have node_rank=0, the next highest vpid process would have node_rank =1, etc. If a process they comm_spawns another process onto the node, it will have node_rank=3 since the computation spans -all- jobs. Every process has full knowledge of the node_rank for every other process executing within that mpirun AND for any proc that connected to it via MPI connect/accept or comm_spawn (the info is included in the modex during the connect/accept procedure). You can obtain the node_rank of any process using orte_node_rank_t orte_ess.get_node_rank(proc_name) This will return ORTE_NODE_RANK_INVALID if the info isn't known. 4. app_rank - the relative rank of the process within its app_context. This equates to the vpid for a job that contains only one app_context. However, for jobs with multiple app_contexts, this value provides a way of determining a proc's rank solely within its own app_context. Each process only has access to its own app_rank in orte_process_info - it doesn't have any knowledge of the app_rank for other processes. HTH Ralph
[OMPI devel] Determining locality
For those wishing to use the new locality functionality, here is a little (hopefully clearer) info on how to do it. A few clarifications first may help: 1. the locality is defined by the precise cpu set upon which a process is bound. If not bound, this obviously includes all the available cpus on the node where the process resides. 2. the locality value we return to you is a bitmask where each bit represents a specific layer of common usage between you (the proc in which the call to orte_ess.proc_get_locality is made) and the given process. In other words, if the "socket" bit is set, it means you and the process you specified are both bound to the same socket. Important note: it does -not- mean that the other process is currently executing on the same socket as you are executing upon at this instant in time. It only means that the OS is allowing that process to use the same socket that you are allowed to use. As the process swaps in/out and moves around, it may or may not be co-located on the socket with you at any given instant. We do not currently provide a way for a process to get the relative locality of two other remote processes. However, the infrastructure supports this, so we can add it if/when someone shows a use-case for it. 3. every process has locality info for all of its peers AND for any proc that connected to it via MPI connect/accept or comm_spawn (the info is included in the modex during the connect/accept procedure). This is true regardless of launch method, with the exception of cnos (which doesn't have a modex). With that in mind, let's start with determining if a proc is on the same node. The only way to determine if two procs other than yourself are on the same node is to compare their daemon vpids: if (orte_ess.proc_get_daemon(A) == orte_ess.proc_get_daemon(B)), then A and B are on the same node. However, there are two ways to determine if another proc is on the same node as you. First, you can of course use the above method to determine if you share the same daemon: if (orte_ess.proc_get_daemon(A) == ORTE_PROC_MY_DAEMON->vpid), then we are on the same node Alternatively, you can use the proc locality since it contains a "node" bit: if (OPAL_PROC_ON_LOCAL_NODE(orte_ess.proc_get_locality(A))), then the proc is on the same node as us. Similarly, we can determine if another process shares a socket, NUMA node, or other hardware element with us by applying the corresponding OPAL_PROC_ON_xxx macro to the locality returned by calling orte_ess.proc_get_locality for that process. HTH Ralph
[OMPI devel] MPI 2.2 datatypes
In MTT testing, we check OMPI version number to decide whether to test MPI 2.2 datatypes. Specifically, in intel_tests/src/mpitest_def.h: #define MPITEST_2_2_datatype 0 #if defined(OPEN_MPI) #if (OMPI_MAJOR_VERSION > 1) || (OMPI_MAJOR_VERSION == 1 && OMPI_MINOR_VERSION >= 7) #undef MPITEST_2_2_datatype #define MPITEST_2_2_datatype 1 #endif #endif #if MPI_VERSION > 2 || (MPI_VERSION == 2 && MPI_SUBVERSION >= 2) #undef MPITEST_2_2_datatype #define MPITEST_2_2_datatype 1 #endif The check looks for OMPI 1.7 or higher, but we introduced support for MPI 2.2. datatypes in 1.5.4. So, can we check for 1.5.4 or higher? Or, is it possible that this support might not go into the first 1.6 release? I'm willing to make the changes, but just wanted some guidance on what to expect in 1.6.
Re: [OMPI devel] === CREATE FAILURE (trunk) ===
regenerating now... On Oct 20, 2011, at 7:14 PM, MPI Team wrote: > > ERROR: Command returned a non-zero exist status (trunk): > make distcheck > > Start time: Thu Oct 20 21:00:02 EDT 2011 > End time: Thu Oct 20 21:14:13 EDT 2011 > > === > [... previous lines snipped ...] > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/vprotocol' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/vprotocol' > (cd mca/common/mx && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/mx \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/mx' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/mx' > (cd mca/common/cuda && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/cuda \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/cuda' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/cuda' > (cd mca/common/sm && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/sm \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/sm' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/sm' > (cd mca/common/portals && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/portals \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/portals' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/portals' > (cd mca/allocator/bucket && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/allocator/bucket \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/bucket' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/bucket' > (cd mca/allocator/basic && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/allocator/basic \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/basic' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/basic' > (cd mca/bml/r2 && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/bml/r2 \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/bml/r2' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/bml/r2' > (cd mca/btl/self && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/btl/self \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/self' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/self' > (cd mca/btl/mx && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../../../openmpi-1.7a1r25345/ompi/mca/btl/mx \ > am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) > make[2]: Entering directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/mx' > make[2]: Leaving directory > `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/mx' > (cd mca/btl/ofud && make top_distdir=../../../../openmpi-1.7a1r25345 > distdir=../../..